<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-8-r157</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Teschendorff</snm>
               <mi>E</mi>
               <fnm>Andrew</fnm>
               <insr iid="I1"/>
               <email>aet21@cam.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Miremadi</snm>
               <fnm>Ahmad</fnm>
               <insr iid="I2"/>
               <email>am643@cam.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Pinder</snm>
               <mi>E</mi>
               <fnm>Sarah</fnm>
               <insr iid="I2"/>
               <email>sep43@cam.ac.uk</email>
            </au>
            <au id="A4">
               <snm>Ellis</snm>
               <mi>O</mi>
               <fnm>Ian</fnm>
               <insr iid="I3"/>
               <email>ian.ellis@nottingham.ac.uk</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Caldas</snm>
               <fnm>Carlos</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>cc234@cam.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Breast Cancer Functional Genomics Laboratory, Cancer Research UK Cambridge Research Institute and Department of Oncology, University of Cambridge, Robinson Way, Cambridge CB2 0RE, UK</p>
            </ins>
            <ins id="I2">
               <p>Cambridge Breast Unit, Addenbrookes Hospital, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, CB2 0QQ, UK</p>
            </ins>
            <ins id="I3">
               <p>Histopathology, Nottingham City Hospital NHS Trust and Department of Pathology, University of Nottingham, Nottingham NG5 1PB, UK</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>8</issue>
         <fpage>R157</fpage>
         <url>http://genomebiology.com/2007/8/8/R157</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17683518</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-8-r157</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>14</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>25</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>2</day>
               <month>8</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>02</day>
               <month>08</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Teschendorff et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Breast cancer prognosis</p>
      </shorttitle>
      <shortabs>
         <p>A feature selection method was used in an analysis of three major microarray expression datasets to identify molecular subclasses and prognostic markers in estrogen receptor-negative breast cancer, showing that it is a heterogeneous disease with at least four main subtypes.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Estrogen receptor (ER)-negative breast cancer specimens are predominantly of high grade, have frequent p53 mutations, and are broadly divided into HER2-positive and basal subtypes. Although ER-negative disease has overall worse prognosis than does ER-positive breast cancer, not all ER-negative breast cancer patients have poor clinical outcome. Reliable identification of ER-negative tumors that have a good prognosis is not yet possible.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We apply a recently proposed feature selection method in an integrative analysis of three major microarray expression datasets to identify molecular subclasses and prognostic markers in ER-negative breast cancer. We find a subclass of basal tumors, characterized by over-expression of immune response genes, which has a better prognosis than the rest of ER-negative breast cancers. Moreover, we show that, in contrast to ER-positive tumours, the majority of prognostic markers in ER-negative breast cancer are over-expressed in the good prognosis group and are associated with activation of complement and immune response pathways. Specifically, we identify an immune response related seven-gene module and show that downregulation of this module confers greater risk for distant metastasis (hazard ratio 2.02, 95% confidence interval 1.2-3.4; <it>P </it>= 0.009), independent of lymph node status and lymphocytic infiltration. Furthermore, we validate the immune response module using two additional independent datasets.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We show that ER-negative basal breast cancer is a heterogeneous disease with at least four main subtypes. Furthermore, we show that the heterogeneity in clinical outcome of ER-negative breast cancer is related to the variability in expression levels of complement and immune response pathway genes, independent of lymphocytic infiltration.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010003">Cancer</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010011">Immunology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>It is widely recognized that estrogen receptor (ER)-positive (ER<sup>+</sup>) and ER-negative (ER<sup>-</sup>) breast cancers are two different disease entities. Generally, ER<sup>- </sup>tumours tend to be of high grade, are more frequently p53 mutated, and have worse prognosis compared with ER<sup>+ </sup>disease. Moreover, while ER<sup>+ </sup>disease can be treated with hormone therapy, the only targeted therapy available for ER<sup>- </sup>patients is a monoclonal antibody that binds to the <it>ERBB2 </it>receptor and that is effective only for those ER<sup>- </sup>tumours with <it>HER2</it>/<it>ERBB2 </it>over-expression.</p>
         <p>In spite of these clinical advances, ER<sup>+ </sup>and ER<sup>- </sup>breast cancers remain heterogeneous diseases, and little is known regarding why patients with the same histopathologic characteristics may have widely different clinical outcomes <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. This is particularly true for the basal subtype of ER<sup>- </sup>breast cancer, which is commonly defined by over-expression of cytokeratin markers (CK 5/6 and CK 14) and which is often also HER2 negative (HER2<sup>-</sup>) <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Most recent efforts to obtain a molecular understanding of the observed heterogeneity have focused on ER<sup>+ </sup>breast cancer, where gene expression signatures that are either prognostic or predictive of response to hormone therapy have been derived <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. In contrast, few studies have thus far attempted to derive a prognostic signature within ER<sup>- </sup>breast cancer. Although cytokeratin markers have been shown to correlate with poor prognosis in breast cancer <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, an attempt to correlate basal markers with survival within ER<sup>- </sup>disease has shown that these markers were not predictive of outcome <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Based on our work presented here, it appears that the prognostic 'signal' in ER<sup>- </sup>breast cancer is much weaker than that in ER<sup>+ </sup>disease. This precludes the use of traditional supervised approaches, which assume a sufficiently low false discovery rate (FDR) for deriving gene expression based classifiers. A similar observation was reported by others <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>Recently, we proposed a novel feature selection method (Profile Analysis using Clustering and Kurtosis [PACK]) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, that selects genes using a pattern recognition method and that may significantly reduce the FDR. Using PACK in an integrated cohort of 186 ER<sup>- </sup>samples and 1,200 genes, we were able to identify distinct molecular subtypes, including a good prognosis subclass characterized by over-expression of immune response genes. However, these results were not validated in external cohorts.</p>
         <p>The purpose of this work is to extend our preliminary findings <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> by applying PACK to the same three breast cancer datasets <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B9">9</abbr><abbr bid="B18">18</abbr></abbrgrp>, but now using a much larger set of common genes (we rescued a significantly larger number of genes from our study <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> by imputing missing data, which led to a much larger number of overlapping genes, approximately 5,000, with the other two arrays.), and to further validate our findings using two external independent cohorts <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B19">19</abbr></abbrgrp>. More generally, our goal is to elucidate the molecular taxonomy of ER<sup>- </sup>breast cancer and, if possible, to find different prognostic subclasses and the corresponding prognostic markers.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>The FDR is higher in ER<sup>- </sup>breast cancer</p>
            </st>
            <p>To understand why in ER<sup>- </sup>breast cancer it has not been possible to derive a validated prognostic signature using conventional approaches, we compared the FDR for ER<sup>+ </sup>and ER<sup>- </sup>disease. We used as cohorts integrated datasets obtained by merging together three of the largest profiled breast cancer cohorts <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B9">9</abbr><abbr bid="B18">18</abbr></abbrgrp> using the z-score transformation, a procedure that we validated previously <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Briefly, the z-score transformation shifts the mean of each gene expression vector in each cohort to zero, while scaling its variance to unity. The transformed gene expression vectors are then merged across cohorts. This merging step resulted in integrated expression matrices of 186 ER<sup>- </sup>and 527 ER<sup>+ </sup>tumors profiled over a common set of 5,007 genes. To enable the comparison, we selected at random 186 ER<sup>+ </sup>tumors from the 527 available. We then used the univariate Cox proportional hazards model with time to distant metastasis (TTDM) and overall survival as end-points to obtain <it>P </it>values of significance for all the genes. Next, we estimated, for each choice of significance threshold, the number of false positives using the q-value approximation <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The numbers of significant genes and false positives as a function of the significance threshold were then plotted for ER<sup>+ </sup>and ER<sup>- </sup>breast cancer and for the two different end-points used (Figure <figr fid="F1">1</figr>). We verified that the curves for ER<sup>+ </sup>breast cancer were robust to random selections of the 186 ER<sup>+ </sup>tumors. This showed that the FDR is much higher in ER<sup>- </sup>tumours and motivated us to develop a different feature selection approach based on a pattern recognition algorithm (PACK) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>FDR comparison in ER<sup>- </sup>and ER<sup>+ </sup>breast cancer</p>
               </caption>
               <text>
                  <p>FDR comparison in ER<sup>- </sup>and ER<sup>+ </sup>breast cancer. For various significance thresholds (sigth), we plot the fraction of observed genes with <it>P </it>values less than the significance threshold (black) as well as the corresponding fraction of false positives, as estimated using a q value analysis (red). <b>(a) </b>Overall survival for ER<sup>+ </sup>breast cancer. <b>(b) </b>Overall survival for ER<sup>- </sup>breast cancer. <b>(c) </b>Time to distant metastasis for ER<sup>+ </sup>breast cancer. <b>(d) </b>Time to distant metastasis for ER<sup>- </sup>breast cancer. <it>P </it>values were obtained from the log-rank test using Cox regression models. ER, estrogen receptor; FDR, false discovery rate.</p>
               </text>
               <graphic file="gb-2007-8-8-r157-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Finding ER<sup>- </sup>subclasses using PACK</p>
            </st>
            <p>If the aim is to identify subclasses within a tumor type, then it is natural that unsupervised methods be applied to sample sets that are composed entirely of this tumor type. In fact, given the hierarchical ER<sup>+</sup>/ER<sup>- </sup>subdivision for breast cancer, it is questionable whether ER<sup>- </sup>subgroups can be correctly defined based on clusters that were derived by using both ER<sup>+ </sup>and ER<sup>- </sup>samples, as was done in other studies (for example <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>). To see this, assume that all tumor types and genes are used in the clustering algorithm. It is very likely that the clusters within ER<sup>- </sup>tumors reflect not only the interesting variability of genes within ER<sup>- </sup>tumors but also the variability of genes that are important for classification of ER<sup>+ </sup>tumors. The variability of these genes in ER<sup>- </sup>tumors may represent undesired noise, which, if not removed, can affect the inferred clusters. Thus, in order to identify relevant subtypes of ER<sup>- </sup>tumors more robustly, we decided to use ER<sup>- </sup>tumors only. Moreover, in view of the relatively high FDR in ER<sup>- </sup>disease, we decided to apply the PACK methodology, which has already been shown to provide a more reliable identification of molecular classifiers <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Because PACK requires large sample sizes (for small sample sizes, kurtosis estimates can have large variance and clustering algorithms that aim to predict the optimal number of clusters have a high false negative rate), we applied it to the integrated data matrix derived previously. Two additional independent ER<sup>- </sup>cohorts <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B19">19</abbr></abbrgrp> were left out to serve as validation studies. The five microarray datasets used are summarized in Table <tblr tid="T1">1</tblr> by platform type, number of ER<sup>- </sup>samples, and number of poor outcome events.</p>
            <tbl id="T1" hint_layout="double">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Breast cancer datasets used</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Ref.</p>
                     </c>
                     <c ca="left">
                        <p>Cohort name</p>
                     </c>
                     <c ca="left">
                        <p>Oligo microarray platform</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>ER<sup>-</sup></p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>ER<sup>+</sup></p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>n</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Death/distant metastasis (<it>n</it>)</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>n</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Death/distant metastasis (<it>n</it>)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[18]</p>
                     </c>
                     <c ca="left">
                        <p>NKI2</p>
                     </c>
                     <c ca="left">
                        <p>Agilent</p>
                     </c>
                     <c ca="left">
                        <p>69</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>226</p>
                     </c>
                     <c ca="left">
                        <p>45</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[5]</p>
                     </c>
                     <c ca="left">
                        <p>EMC</p>
                     </c>
                     <c ca="left">
                        <p>Affymetrix</p>
                     </c>
                     <c ca="left">
                        <p>77</p>
                     </c>
                     <c ca="left">
                        <p>27</p>
                     </c>
                     <c ca="left">
                        <p>208</p>
                     </c>
                     <c ca="left">
                        <p>80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[9]</p>
                     </c>
                     <c ca="left">
                        <p>NCH</p>
                     </c>
                     <c ca="left">
                        <p>Agilent</p>
                     </c>
                     <c ca="left">
                        <p>40</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>93</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[19]</p>
                     </c>
                     <c ca="left">
                        <p>UPP</p>
                     </c>
                     <c ca="left">
                        <p>Affymetrix</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>213</p>
                     </c>
                     <c ca="left">
                        <p>49</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[7]</p>
                     </c>
                     <c ca="left">
                        <p>JRH-2</p>
                     </c>
                     <c ca="left">
                        <p>Affymetrix</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>72</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>ER, estrogen receptor.</p>
               </tblfn>
            </tbl>
            <p>Briefly, we review the concepts that underpin PACK (see Materials and methods, below, and Figure <figr fid="F2">2</figr>). The hypothesis is that genes that play an important role as classifiers or biomarkers are more likely to have expression profiles that are mixtures of gaussian distributions. On the other hand, false positives, in spite of their spurious association with a phenotype, are less likely to be described by a mixture of distributions. Thus, selecting genes based on whether they have structure in their expression profiles is likely to pick out the relevant markers from those that are just false positives. Next, we propose to focus on those genes that define the largest subgroups. Although genes that define small subgroups are also of interest, it is natural to identify first those genes that define the largest subclasses. While such features can be found from the inferred cluster sizes, it turns out that such features are generally also characterized by a negative (or close to zero) kurtosis profile (see Materials and methods, below) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. As shown previously, negative kurtosis expression profiles are in effect a mixture of at least two (gaussian) distributions of approximately equal weights. Thus, by selecting those genes that have the most negative kurtosis expression profiles, we have identified the markers that define the largest subclasses within the sample set (Figure <figr fid="F2">2</figr>). It is clear that many of these features will be highly correlated because they define almost the same subclasses. It follows that further application of traditional clustering algorithms over these negative kurtosis profiles will enable reliable identification of the major subclasses within the sample set. We note that because we are interested in the most negative kurtosis profiles and because the clusters in the individual gene profiles are only needed to study the cluster distribution of phenotypes, the cluster inference step on the individual gene profiles (known as PAC) can be performed after computation of kurtosis (known as PAK) on the selected subset of negative kurtosis profiles (Figure <figr fid="F2">2</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>PACK flowchart</p>
               </caption>
               <text>
                  <p>PACK flowchart. <b>(a) </b>A schematic diagram of PACK, as used in this study. For each gene expression profile an unbiased estimate of its kurtosis, K, is computed. Genes with negative kurtosis are selected because only these define large subgroups (of sizes >22% of the total sample size). Further unsupervised clustering may then be performed on this subset of negative kurtosis profiles to find novel tumor subclasses. Alternatively, to find robust prognostic markers, negative kurtosis profiles are filtered further based on whether there is evidence of bimodality (C = 2). This step requires a cluster inference algorithm and a model selection criterion to discard those profiles that are best described by a single gaussian (C = 1; by random chance gaussian profiles may have negative kurtosis). Correlation to phenotypes (here phenotypes) is done with Fisher's test to evaluate whether the distribution of the categorical phenotype across the two clusters is significantly different from random. <b>(b) </b>Density curves of typical bimodal negative and positive kurtosis gene expression profiles. X-axis shows gene expression on a log2 scale. PACK, Profile Analysis using Clustering and Kurtosis.</p>
               </text>
               <graphic file="gb-2007-8-8-r157-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Distinct molecular subgroups of ER<sup>- </sup>breast cancer</p>
            </st>
            <p>Applying PAK to the integrated ER- data matrix of 5,007 genes, we found 813 genes with a negative kurtosis profile (Additional data file 1). Interestingly, applying the same analysis to ER<sup>+ </sup>breast cancer, we found only a much smaller number (193) of negative kurtosis profiles (Additional data file 2), despite there being roughly twice as many bimodal profiles in ER<sup>+ </sup>breast cancer (about 4500 in ER<sup>+ </sup>versus about 2,500 in ER<sup>-</sup>). We verified by explicit simulation that the significantly lower proportion of negative kurtosis profiles in ER<sup>+ </sup>disease could not be explained by the larger sample size in ER<sup>+ </sup>(527) compared with ER<sup>- </sup>disease (186; data not shown).</p>
            <p>Having identified the relevant features, we next clustered the ER<sup>- </sup>tumors over these. Using hierarchical clustering with Pearson correlation metric and complete linkage, we found that samples clustered into five main groups, each characterized by the expression patterns of four gene clusters that were found to be strongly enriched for specific gene ontologies (Figure <figr fid="F3">3a</figr> and Additional data file 3). One group was characterized by over-expression of <it>ERBB2</it>, the steroid hormone receptor <it>AR</it>, and genes related to steroid estrogen response (such as <it>GATA3</it>, <it>TFF1</it>, and <it>DNALI1</it>). This subtype is therefore similar to the apocrine subclass, recently proposed <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>, which is characterized by over-expression of <it>AR </it>and genes that are either direct targets of ER or responsive to estrogen <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. Thus, we decided to call this ER<sup>- </sup>subtype (over-expressing steroid response genes) 'SR<sup>+</sup>', although it is clear that it also defines the well known HER2<sup>+ </sup>subtype.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Molecular subclasses in ER<sup>- </sup>breast cancer</p>
               </caption>
               <text>
                  <p>Molecular subclasses in ER<sup>- </sup>breast cancer. <b>(a) </b>Complete linkage hierarchical clustering of 186 ER<sup>- </sup>breast tumors over 813 genes with negative kurtosis profiles. Five sample clusters were identified and characterized in terms of the patterns of over-expression and under-expression of four gene clusters related to cell cycle (CC; blue), immune response (IR; red), extracellular matrix (ECM; green), and steroid hormone response (SR; pink) functions. Panels show the distribution of the SSP subtype [23], the lymphocytic infiltration score, histologic grade, basal marker [27], and <it>ERBB2</it><sup>+ </sup>amplifier subtype. Panel color codes: SSP (pink = HER2, brown = basal, dark green = normal, sky blue = luminal A, and blue = luminal B); LYM.INF (black = high, gray = low, and white = missing); GRADE (black = high, blue = intermediate, sky blue = low, and white = missing), BASAL.MARK. (black = high and white = low), ERBB2-AMP (black = high and white = low). The BASAL.MARK. profile represents an average over validated basal markers in [27], whereas the ERBB2-AMP profile was calculated as an average over three genes in the <it>ERBB2 </it>amplicon (<it>ERBB2</it>, <it>STARD3</it>, <it>GRB7</it>). <b>(b) </b>Kaplan-Meier curves for time to distant metastasis (years) and for the five subclasses identified in panel (a). <b>(c) </b>Partitioning around medoids clustering over the seven-gene prognostic immune response module. Panel color codes: purple = cluster over-expressing module, yellow = cluster under-expressing module, black = poor outcome samples, gray = good outcome samples, green = relative under-expression, and red = relative over-expression. <b>(d) </b>Kaplan-Meier curves for time to distant metastasis for the two groups identified in panel (c). Hazard ratio, 95% confidence interval, and log-rank test <it>P </it>values are shown. ER, estrogen receptor; SSP, single sample predictor.</p>
               </text>
               <graphic file="gb-2007-8-8-r157-3"/>
            </fig>
            <p>The other four groups were characterized mainly by absent or lower expression of these steroid response genes. One of these four clusters was characterized by over-expression of genes related to cell cycle and cell proliferation pathways (CC<sup>+</sup>), and another cluster also had over-expression of immune response genes (CC<sup>+</sup>/IR<sup>+</sup>). For the remaining two clusters, one was characterized by over-expression of extracellular matrix genes (ECM<sup>+</sup>), and the other was characterized by over-expression of immune response genes (IR<sup>+</sup>) only.</p>
         </sec>
         <sec>
            <st>
               <p>Relation to the intrinsic subtype classification</p>
            </st>
            <p>In order to relate the five identified molecular subtypes to the intrinsic breast cancer classification <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B26">26</abbr></abbrgrp>, we used the recently validated single sample predictor (SSP) <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> to classify the 186 ER<sup>- </sup>samples into the various intrinsic subtypes (Figure <figr fid="F3">3a</figr> and Table <tblr tid="T2">2</tblr>). In addition, we studied the expression profiles of recently validated basal markers <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and genes in the <it>ERBB2 </it>amplicon across the five identified clusters (Figure <figr fid="F3">3a</figr> and Additional data file 4). Based on these figures and Table <tblr tid="T2">2</tblr>, we could draw the following conclusions. First, the SR<sup>+ </sup>cluster was highly correlated with the usual HER2<sup>+ </sup>intrinsic subtype. Second, the CC<sup>+</sup>/IR<sup>+ </sup>and CC<sup>+ </sup>clusters defined distinct subtypes of basal tumors. Third, the ECM<sup>+ </sup>cluster was mostly basal, but it contained a relatively high proportion of normal-like and luminal-A-like ER<sup>- </sup>tumors; it also exhibited the most varied grade distribution, with most low-grade ER<sup>- </sup>tumors falling into this class. Finally, the main constituents of the IR<sup>+ </sup>cluster were basal and HER2<sup>+</sup>.</p>
            <tbl id="T2" hint_layout="double">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Distribution of ER<sup>- </sup>samples among clusters, prognostic groups and intrinsic subtypes</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="left">
                        <p>Cluster</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>n</it>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Outcome</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>LI</p>
                     </c>
                     <c cspan="5" ca="center">
                        <p>ER<sup>-</sup></p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Good</p>
                     </c>
                     <c ca="left">
                        <p>Poor</p>
                     </c>
                     <c ca="left">
                        <p>High</p>
                     </c>
                     <c ca="left">
                        <p>Low</p>
                     </c>
                     <c ca="left">
                        <p>Luminal B</p>
                     </c>
                     <c ca="left">
                        <p>Luminal A</p>
                     </c>
                     <c ca="left">
                        <p>Normal</p>
                     </c>
                     <c ca="left">
                        <p>Basal</p>
                     </c>
                     <c ca="left">
                        <p>HER2</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CC<sup>+</sup>/IR<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>39</p>
                     </c>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>38</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CC<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>33</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ECM<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>45</p>
                     </c>
                     <c ca="left">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SR<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>36</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>32</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IR<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>29</p>
                     </c>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>For the five clusters of Figure 2a, we give the number of samples per cluster (<it>n</it>) and their distributions over good/poor outcome, high/low lymphocytic infiltration (LI) score, and intrinsic subtypes as determined by the single sample predictor (SSP) classifier. Clusters were labeled by the main Gene Ontology of genes over-expressed in the group: CC<sup>+ </sup>(cell cycle), CC<sup>+</sup>/IR<sup>+ </sup>(cell cycle and immune response), ECM<sup>+ </sup>(extracellular matrix), SR<sup>+ </sup>(steroid hormone response), and IR<sup>+ </sup>(immune response). ER, estrogen receptor.</p>
               </tblfn>
            </tbl>
            <p>It further follows from these observations that ER<sup>- </sup>normal and luminal A samples were predominantly characterized by over-expression of ECM genes. ER<sup>- </sup>basal tumors, on the other hand, exhibited a more complex pattern and appeared to divide into at least four subgroups (CC<sup>+</sup>/IR<sup>+</sup>, CC<sup>+</sup>, ECM<sup>+</sup>, and IR<sup>+</sup>).</p>
            <p>To investigate further the relation of our five ER- subclasses with the intrinsic subtypes, we considered to which ER<sup>- </sup>subclass ER<sup>+ </sup>samples of known intrinsic subtype were most similar. To this end we first constructed, for each of the five ER<sup>- </sup>subclasses, mean centroids over the 813 negative kurtosis genes (Additional data file 5). To validate the centroids, the same ER<sup>- </sup>samples were assigned a subclass using a nearest centroid criterion (samples for which the largest Pearson correlation coefficient was &lt;0.25 were considered unclassified; see Materials and methods, below), which showed that 156 (84%) were classified, of which 143 (92%) were assigned the correct subclass. Next, using the SSP to classify the 527 ER<sup>+ </sup>samples into the intrinsic subtypes, we then assigned each of the 527 ER<sup>+ </sup>samples into an ER<sup>- </sup>subclass based on the same nearest centroid criterion (Table <tblr tid="T3">3</tblr>). As expected, only 4% of ER<sup>+ </sup>tumors were classified as basal, whereas the majority (82%) of them were luminal. Moreover, the analysis showed that ER<sup>+ </sup>luminal B samples were most similar to CC<sup>+ </sup>and CC<sup>+</sup>/IR<sup>+ </sup>ER<sup>- </sup>samples, which is consistent with the fact that all of these samples over-express cell cycle and cell proliferation genes. In contrast, ER<sup>+ </sup>luminal A samples were most similar to ECM<sup>+ </sup>(63%), IR<sup>+ </sup>(26%), and SR<sup>+ </sup>(8%) ER<sup>- </sup>samples. Not surprisingly, almost all (16/19 [84%]) 'normal' ER<sup>+ </sup>samples were most similar to ECM<sup>+ </sup>ER<sup>- </sup>samples. All basal ER<sup>+ </sup>samples had expression profiles most similar to CC<sup>+</sup>/IR<sup>+ </sup>and CC<sup>+ </sup>subtypes. Interestingly, only 16 of the 42 (38%) ER<sup>+ </sup>HER2<sup>+ </sup>tumors exhibited significantly correlated expression profiles to any one of the five ER<sup>- </sup>subclasses, with most of these (11) mapping to the CC<sup>+</sup>/IR<sup>+ </sup>subtype.</p>
            <tbl id="T3" hint_layout="double">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Classification of ER<sup>+ </sup>intrinsic subtypes, medullary breast cancer, and BRCA1 tumors into ER<sup>- </sup>subclasses</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5" ca="center">
                        <p>ER<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>MBC</p>
                     </c>
                     <c ca="left">
                        <p>DBC</p>
                     </c>
                     <c ca="left">
                        <p>BRCA1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Luminal B</p>
                     </c>
                     <c ca="left">
                        <p>Luminal A</p>
                     </c>
                     <c ca="left">
                        <p>Normal</p>
                     </c>
                     <c ca="left">
                        <p>Basal</p>
                     </c>
                     <c ca="left">
                        <p>HER2</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>n</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>97</p>
                     </c>
                     <c ca="left">
                        <p>337</p>
                     </c>
                     <c ca="left">
                        <p>32</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>44</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>n </it>classifiable</p>
                     </c>
                     <c ca="left">
                        <p>37</p>
                     </c>
                     <c ca="left">
                        <p>113</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>33</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CC<sup>+</sup>/IR<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CC<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ECM<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>71</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SR<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IR<sup>+</sup></p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>29</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Classification of ER<sup>+ </sup>intrinsic subtypes (527 samples from the integrated cohort NKI2 + EMC + NCH), 22 medullary breast cancers (MBCs) and 44 ductal breast cancers (DBCs) from [38], and 18 BRCA1 mutants from [3] into ER<sup>- </sup>subclasses, using the nearest centroid criterion with Pearson correlation as distance metric. Only samples with Pearson correlation coefficients larger than 0.25 were deemed classifiable. ER, estrogen receptor.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>A subgroup of good prognosis in ER<sup>- </sup>breast cancer</p>
            </st>
            <p>We next considered whether the five identified clusters were associated with different prognostic groups. Because we had merged different cohorts, and it is questionable whether survival data can be also merged together, we decided to test first the clusters for association with clinical outcome by using a dichotomized outcome variable. Specifically, poor outcome was defined as any death or distant metastasis event, whereas good outcome was defined as a patient alive or with no distant metastasis. By studying the distribution of good and poor outcome events in the respective clusters, significant associations with prognosis could be found by means of Fisher's exact test (Table <tblr tid="T2">2</tblr>). Interestingly, this showed that the IR<sup>+ </sup>subgroup had better prognosis when compared with the ECM<sup>+ </sup>(<it>P </it>= 0.08), CC<sup>+ </sup>(<it>P </it>= 0.03), and SR<sup>+ </sup>(<it>P </it>= 0.005) subclasses.</p>
            <p>Compared with the CC<sup>+</sup>/IR<sup>+ </sup>subgroup it also had better prognosis, although the difference was not statistically significant (<it>P </it>= 0.19). We thus combined the CC<sup>+</sup>/IR<sup>+ </sup>and IR<sup>+ </sup>subclasses together and evaluated the prognosis of this larger subclass relative to the rest of ER<sup>- </sup>samples. Consistent with our previous result <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> we found that the ER<sup>- </sup>tumors over-expressing immune response genes had better prognosis than ER<sup>- </sup>samples that did not (<it>P </it>= 0.02).</p>
            <p>To further confirm our findings, we also generated Kaplan-Meier survival curves for the five identified subclasses using TTDM as the end-point (Figure <figr fid="F3">3b</figr>). This showed that the SR<sup>+ </sup>and CC<sup>+ </sup>subclasses had worst prognosis, whereas the IR<sup>+ </sup>subclass was the group with best prognosis. Specifically, relative to the IR<sup>+ </sup>subclass the SR<sup>+ </sup>subgroup had a hazard ratio (HR) of 3.70 (95% confidence interval [CI] 1.49 to 9.24; <it>P </it>= 0.005), whereas the CC<sup>+ </sup>subgroup had an HR of 2.75 (95% CI 1.07 to 7.05; <it>P </it>= 0.035). Similarly, relative to the CC<sup>+</sup>/IR<sup>+ </sup>subclass, the SR<sup>+ </sup>subgroup had an HR of 2.35 (95% CI 1.13 to 4.88; <it>P </it>= 0.02), whereas for the CC<sup>+ </sup>subgroup it did not quite reach statistical significance (HR 1.80, 95% CI 0.83 to 3.89; <it>P </it>= 0.13). We verified the statistical significance of these survival differences by 10,000 random permutations of the samples, which showed that the theoretical <it>P </it>value estimates above were essentially identical to the empirically derived <it>P </it>values.</p>
         </sec>
         <sec>
            <st>
               <p>Prognostic markers in ER<sup>- </sup>tumors are associated with immune response functions</p>
            </st>
            <p>The better prognosis of the IR<sup>+ </sup>subclass is likely due to specific genes that are individually prognostic. In order to find these, we applied the PAC algorithm to the 813 genes with negative kurtosis expression profiles. Briefly, this procedure selects features based on whether their expression profiles are best described as a mixture of gaussians or not, and then evaluates whether the distribution of good and poor outcome events among the inferred clusters is statistically different from random or not. Applying PAC, we found 22 genes with a Fisher test <it>P </it>&lt; 0.05 (Additional data file 1). A Gene Ontology (GO) enrichment analysis of these 22 genes using the GOTree Machine <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> showed that immune/defense response was the most enriched GO, with seven genes falling into this category (<it>C1QA</it>, <it>IGLC2</it>, <it>LY9</it>, <it>TNFRSF17</it>, <it>SPP1</it>, <it>XCL2</it>, and <it>HLA-F</it>). The expression profiles for these seven genes confirmed the presence of distinct clusters with nonrandom distributions of good and poor outcome samples (Figure <figr fid="F4">4</figr> provides detailed profiles of <it>C1QA </it>and <it>IGLC2</it>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Expression profiles of selected prognostic markers in ER<sup>- </sup>breast cancer</p>
               </caption>
               <text>
                  <p>Expression profiles of selected prognostic markers in ER<sup>- </sup>breast cancer. Expression profile (on a log2 scale) of selected prognostic markers <b>(a) </b><it>IGLC2 </it>and <b>(b) </b><it>C1QA </it>in the integrated cohort of 186 ER<sup>- </sup>tumours (NKI2 + EMC + NCH), and in the validation cohorts UPP and JRH-2. Good outcome samples are shown in green, and poor outcome samples in blue. Clusters were inferred using the variational Bayesian approach in NKI2 + EMC + NCH and the pam algorithm in the UPP and JRH-2 cohorts. Infered clusters are indicated by different shapes (triangles and diamonds). ER, estrogen receptor; pam, partitioning around medoids.</p>
               </text>
               <graphic file="gb-2007-8-8-r157-4"/>
            </fig>
            <p>In contrast, application of the same method to ER<sup>+ </sup>breast cancer yielded 29 genes (out of a possible 193) with <it>P </it>&lt; 0.05 (Additional data file 2). In spite of there being only 29 genes in this list, there were many with mitotic cell cycle functions, notably <it>UBE2C</it>, <it>MAD2L1</it>, <it>E2F1 </it>and <it>KIFC1</it>, and GO enrichment analysis confirmed that the cell cycle GO was the most significantly enriched category followed by transcription regulatory activity. This result for ER<sup>+ </sup>breast cancer confirms findings reported elsewhere that poor prognosis in ER<sup>+ </sup>breast cancer is related mainly to over-expression of genes in cell cycle and cell proliferation pathways <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp>. Notably, none of the identified prognostic genes were related to immune response functions.</p>
            <p>Having identified a prognostic module of seven immune response genes (henceforth called the IR module), we next confirmed that clustering over this module resulted in clusters significantly associated with clinical outcome. Specifically, the partitioning around medoids (pam) clustering algorithm <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> with two centers predicted one cluster with 52 good outcome and 56 poor outcome patients, and another cluster with 56 good outcome and only 19 poor outcome events, which was highly significant under Fisher's exact test (<it>P </it>= 0.0004; Figure <figr fid="F3">3c</figr>). Kaplan-Meier and Cox regression analyses further confirmed that the group not over-expressing the seven-gene module (out of the seven genes the majority [six] are not over-expressed) had a greater risk for distant metastasis (HR 2.02, 95% CI 1.2 to 3.4; <it>P </it>= 0.009; Figure <figr fid="F3">3d</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>Relation to <it>STAT1 </it>and IFN cluster</p>
            </st>
            <p>Next, we investigated the relation of the good prognosis subgroup identified here with the novel IFN regulated cluster identified recently <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. As shown previously <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, the IFN cluster, defined by over-expression of interferon regulated genes, including the transcriptional regulator <it>STAT1</it>, was most closely related to the basal subclass and had a prognostic performance similar to that of luminal B samples when compared with the luminal A and normal subtypes. Interestingly, when compared with the basal and HER2<sup>+ </sup>subclasses, the IFN group had better prognosis, although a formal comparison was not conducted by Hu and coworkers <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
            <p>Among the 98 genes in the immune response gene cluster (Figure <figr fid="F3">3a</figr> and Additional data file 1), we identified a total of 14 with interferon related functions, including <it>STAT1</it>, <it>SP110</it>, <it>NFKBI</it>, <it>IFI44</it>, <it>IFNGR1</it>, <it>ISGF3G</it>, and <it>IRF7</it>. Interestingly, however, none of these genes showed association with prognosis except <it>SP110</it>. Thus, although it appears that our subclass is related to the IFN class discovered by Hu and coworkers <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, it is also distinct in that the associated prognostic markers are not in the IFN cluster.</p>
         </sec>
         <sec>
            <st>
               <p>Immune response module predicts outcome independently of lymph node status</p>
            </st>
            <p>Because our IR gene cluster included <it>STAT1 </it>and interferon-induced genes, and these genes have been shown to be associated with lymph node (LN) metastasis <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, we considered whether the subgroup of ER<sup>- </sup>samples over-expressing the 98-gene IR cluster was significantly associated with LN status. Because all patients in the ECM cohort were LN negative, this analysis was only performed on the NKI2 and NCH cohorts (109 ER<sup>- </sup>patients, of whom 42 had LN involvement). Using pam clustering with two centers over these 98 genes and 109 samples, we found subgroups with similar proportions of LN metastases (43 LN negative and 28 LN positive versus 24 LN negative and 14 LN positive; Fisher's exact test <it>P </it>= 0.84). Moreover, clustering only over the 14 genes involved in the IFN subcluster still did not show a significantly nonrandom distribution of LN metastases among the clusters (<it>P </it>= 0.23), although in agreement with Huang and coworkers <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> the cluster over-expressing the IFN genes had proportionally more LN metastases. Similarly, the distribution of LN metastases among the two clusters predicted by the IR module was not significantly different from random (<it>P </it>= 0.48). While LN status itself was a significant predictor of distant metastasis both in univariate (HR 2.28, 95% CI 1.36 to 3.85; <it>P </it>= 0.002) and multivariate (HR 2.16, 95% CI 1.28 to 3.64; <it>P </it>= 0.004) Cox regression analysis (this result is for a multivariate model including LN status and the IR module as predictors.), importantly, the IR module remained a prognostic predictor for TTDM (HR 1.93, 95% CI 1.14 to 3.26; <it>P </it>= 0.015) in the multivariate model that included LN status (Table <tblr tid="T4">4</tblr>). This showed that the IR module identified here adds independent prognostic value over LN status in ER<sup>- </sup>breast cancer.</p>
            <tbl id="T4" hint_layout="double">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Univariate and Multivariate Cox-regression model</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>ER<sup>- </sup>(<it>n </it>= 186)</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>ER<sup>+ </sup>(<it>n </it>= 527)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>HR (95% CI)</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>P</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>HR (95% CI)</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>P</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN</p>
                     </c>
                     <c ca="left">
                        <p>2.28 (1.36-3.85)</p>
                     </c>
                     <c ca="left">
                        <p>0.002</p>
                     </c>
                     <c ca="left">
                        <p>2.07 (1.47-2.90)</p>
                     </c>
                     <c ca="left">
                        <p>&lt;10<sup>-4</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LI</p>
                     </c>
                     <c ca="left">
                        <p>1.06 (0.66-1.70)</p>
                     </c>
                     <c ca="left">
                        <p>0.9</p>
                     </c>
                     <c ca="left">
                        <p>1.50 (0.72-3.14)</p>
                     </c>
                     <c ca="left">
                        <p>0.58</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IRM</p>
                     </c>
                     <c ca="left">
                        <p>2.02 (1.19-3.41)</p>
                     </c>
                     <c ca="left">
                        <p>0.009</p>
                     </c>
                     <c ca="left">
                        <p>1.25 (0.91-1.71)</p>
                     </c>
                     <c ca="left">
                        <p>0.19</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN<sup>a </sup>+ IRM</p>
                     </c>
                     <c ca="left">
                        <p>2.16 (1.28-3.64)</p>
                     </c>
                     <c ca="left">
                        <p>0.004</p>
                     </c>
                     <c ca="left">
                        <p>2.10 (1.49-2.96)</p>
                     </c>
                     <c ca="left">
                        <p>&lt;10<sup>-4</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN + IRM<sup>a</sup></p>
                     </c>
                     <c ca="left">
                        <p>1.93 (1.14-3.26)</p>
                     </c>
                     <c ca="left">
                        <p>0.015</p>
                     </c>
                     <c ca="left">
                        <p>1.29 (0.94-1.76)</p>
                     </c>
                     <c ca="left">
                        <p>0.11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LI<sup>a </sup>+ IRM</p>
                     </c>
                     <c ca="left">
                        <p>0.86 (0.32-2.28)</p>
                     </c>
                     <c ca="left">
                        <p>0.76</p>
                     </c>
                     <c ca="left">
                        <p>1.75 (0.41-7.47)</p>
                     </c>
                     <c ca="left">
                        <p>0.45</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LI + IRM<sup>a</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.05 (0.71-5.97)</p>
                     </c>
                     <c ca="left">
                        <p>0.19</p>
                     </c>
                     <c ca="left">
                        <p>0.57 (0.27-1.19)</p>
                     </c>
                     <c ca="left">
                        <p>0.13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN<sup>a </sup>+ LI + IRM</p>
                     </c>
                     <c ca="left">
                        <p>1.79 (0.70-4.62)</p>
                     </c>
                     <c ca="left">
                        <p>0.22</p>
                     </c>
                     <c ca="left">
                        <p>1.48 (0.68-3.19)</p>
                     </c>
                     <c ca="left">
                        <p>0.32</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN + LI<sup>a </sup>+ IRM</p>
                     </c>
                     <c ca="left">
                        <p>0.84 (0.51-1.38)</p>
                     </c>
                     <c ca="left">
                        <p>0.72</p>
                     </c>
                     <c ca="left">
                        <p>1.65 (0.38-7.07)</p>
                     </c>
                     <c ca="left">
                        <p>0.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN + LI + IRM<sup>a</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.22 (0.76-6.50)</p>
                     </c>
                     <c ca="left">
                        <p>0.15</p>
                     </c>
                     <c ca="left">
                        <p>0.57 (0.27-1.20)</p>
                     </c>
                     <c ca="left">
                        <p>0.14</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The table summarizes the hazard ratio (HR), 95% confidence interval (CI), and log-rank test <it>P </it>values of univariate Cox proportional hazards regression models, with lymph node status (LN = 1/0 for LN +/-), level of lymphocytic infiltration (LI = 1 for low infiltration score, and LI = 0 for high infiltration score) and the classification based on the seven-gene immune response related module (IRM; 2 = down-regulation of module, 1 = upregulation of module) as predictors. <sup>a</sup>Corresponding values in the multivariate Cox models including LN, LI, and IR module as predictors. The table compares the values for estrogen receptor (ER)<sup>- </sup>and ER<sup>+ </sup>breast cancer.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Immune response module predicts outcome independently of lymphocytic infiltration</p>
            </st>
            <p>The upregulation of immune response genes in good prognosis tumors could be explained by the fact that these tumors elicit a stronger immune and inflammatory response, as measured for example by a higher degree of lymphocytic infiltration (LI). The association of high LI with good prognosis is well known <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, and although a few conflicting results have also been reported <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>, we thought it natural to consider whether upregulation of the identified immune response module conferred good prognosis independently of LI. To this end, we scored the samples in the NCH cohort for LI (see Materials and methods, below) and combined these with the available LI score information from the NKI2 cohort, yielding a total of 50 scored samples. We found that although there were proportionally more tumors with high LI scores in the group over-expressing the immune response genes (specifically there were 11 high LI and 9 low LI samples versus 8 high LI and 22 low LI in the under-expressed group; Fisher test, <it>P </it>= 0.07), a multivariate Cox regression with TTDM as the outcome variable and the seven-gene IR module and LI score as predictors showed that the immune response module was still a strong predictor of clinical outcome, independent of LI (HR 2.05, 95% CI 0.71 to 5.97; <it>P </it>= 0.19; Table <tblr tid="T4">4</tblr>). (The <it>P </it>value is only marginally below statistical significance, which is most likely due to the relatively small sample size.) Supporting this result further, we did not find in ER<sup>- </sup>tumors any significant association between LI and clinical outcome (Table <tblr tid="T4">4</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>Relation to medullary and BRCA1 ER<sup>- </sup>breast cancer</p>
            </st>
            <p>To further confirm the independent prognostic power of the IR module from LI, we investigated the relationship of the identified good prognosis tumors with medullary breast cancers (MBCs), which are characterized by high LI scores and relatively good prognosis <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Thus, we considered to which ER<sup>- </sup>subclass the 22 MBC expression profiles from the report by Bertucci and coworkers <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> were most similar (Table <tblr tid="T3">3</tblr>). (For completeness and reference, we also performed the analysis for the 44 ductal breast cancers [DBCs], also profiled by Bertucci and coworkers <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.) This showed that of the 22 MBCs, 20 showed reasonably strong correlation to one of the ER<sup>- </sup>subclass centroids, 14 of which (70%) were most similar to the CC<sup>+</sup>/IR<sup>+ </sup>subtype, whereas only two (11%) were most similar to the IR<sup>+ </sup>subtype. In contrast, for the 33 DBC which could be classified, only four (12%) and three (9%) were most similar to the CC<sup>+</sup>/IR<sup>+ </sup>and IR<sup>+ </sup>subtypes. These results mirror the distribution of LI scores across the five ER<sup>- </sup>subclasses (Figure <figr fid="F3">3a</figr> and Table <tblr tid="T2">2</tblr>) and further confirms that the best prognostic group (IR<sup>+</sup>) is not related to MBC, whereas CC<sup>+</sup>/IR<sup>+ </sup>is.</p>
            <p>A similar analysis was performed for BRCA1 mutant tumors. Of the 16 BRCA1 mutants from the NKI2 cohort <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, which were also deemed classifiable based on a 0.25 correlation threshold, 13 (81%) had expression profiles most similar to the CC<sup>+</sup>/IR<sup>+ </sup>subtype (Table <tblr tid="T3">3</tblr>). None showed similarity to the IR<sup>+ </sup>subclass. Therefore, this suggests that ER<sup>- </sup>BRCA1 mutants, in common with MBCs, are most similar to the CC<sup>+</sup>/IR<sup>+ </sup>subclass identified here.</p>
         </sec>
         <sec>
            <st>
               <p>External validation</p>
            </st>
            <p>Having identified a prognostic module related to immune response, we next attempted to validate this finding using two independent breast cancer cohorts <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B19">19</abbr></abbrgrp>. Specifically, the hypothesis to be tested was that over-expression of the identified prognostic markers is associated with good prognosis, except for <it>SPP1</it>, for which good prognosis is hypothesized to be associated with under-expression. Because of the relatively small sample size of the two external cohorts, an algorithm that attempts to learn the optimal number of clusters (as implemented in PACK) is unlikely to capture structure because of a large false negative rate <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Hence, in order to define groups of over-expression and under-expression, we applied the pam algorithm with two centers to each of the seven genes in each of the two cohorts (Figure <figr fid="F4">4</figr> and Additional data files 6 and 7). Because the small number of events, six, in each of the two external cohorts implied a highly discrete <it>P </it>value distribution, Fisher test <it>P </it>values would be too conservative and poor approximations for type I errors <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. To overcome this difficulty, we also considered the distribution of good and poor outcome samples over the combined clusters of over-expression and under-expression (Table <tblr tid="T5">5</tblr>). This showed that four of the seven genes (<it>C1QA</it>, <it>IGLC2</it>, <it>TNFRSF17</it>, and <it>LY9</it>) were also highly prognostic in these two external cohorts, thus confirming the validity of our finding. Moreover, as in the three original cohorts, over-expression of these genes in the two external cohorts was associated with good prognosis (Figure <figr fid="F4">4</figr> and Additional data files 6 and 7). For the other three genes (<it>HLA-F</it>, <it>SPP1</it>, and <it>XCL2</it>), <it>P </it>values did not reach statistical significance (<it>P </it>about 0.2), yet their expression profile trends were entirely consistent with those found in the integrated cohort, thus confirming their role as members of a robust prognostic module (Additional data files 6 and 7).</p>
            <tbl id="T5" hint_layout="double">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>External validation of immune response prognostic module: distribution of poor and good outcome patients in over-and-underexpressed subgroups</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="left">
                        <p>Gene</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>UPP</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>JRH-2</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Combined</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Poor</p>
                     </c>
                     <c ca="left">
                        <p>Good</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>P</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Poor</p>
                     </c>
                     <c ca="left">
                        <p>Good</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>P</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Poor</p>
                     </c>
                     <c ca="left">
                        <p>Good</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>P</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IGLC2</p>
                     </c>
                     <c ca="left">
                        <p>High</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>0.04</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>0.09</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Low</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LY9</p>
                     </c>
                     <c ca="left">
                        <p>High</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>0.05</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>0.14</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>0.007</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Low</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TNFRSF17</p>
                     </c>
                     <c ca="left">
                        <p>High</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>0.05</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>0.02</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Low</p>
                     </c>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C1QA</p>
                     </c>
                     <c ca="left">
                        <p>High</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>0.04</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>0.04</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Low</p>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>To confirm the robustness of our findings, we used the pam algorithm to classify patients in the two external cohorts into clusters over-expressing and under-expressing the IR module (Figure <figr fid="F5">5a,b</figr>). Remarkably, the predicted 20-sample cluster over-expressing the module was composed entirely of good outcome patients, whereas the remaining 35-sample cluster included 12 poor outcome events (Figure <figr fid="F5">5c</figr>), which was highly significant under Fisher's exact test (<it>P </it>= 0.002; the HR is singular because one cluster did not have any events).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Pam clustering over IR module in external ER<sup>- </sup>cohorts</p>
               </caption>
               <text>
                  <p>Pam clustering over IR module in external ER<sup>- </sup>cohorts. Heatmap of gene expression of seven-gene IR-module in ER<sup>- </sup>samples of the <b>(a) </b>UPP and <b>(b) </b>JRH-2 cohorts. Shown are the clusters over-expressing (purple) and under-expressing (yellow) the IR module, as predicted by the pam algorithm. Good outcome samples are shown in gray, and poor outcome samples in black. Green indicates relative under-expression, and red indicates relative over-expression. <b>(c) </b>Kaplan-Meier survival curves over combined external cohorts (for UPP end-point was disease-specific survival, and for JRH-2 it was recurrence-free survival), with the number of events and samples in each of the two predicted groups. ER, estrogen receptor; pam, partitioning around medoids.</p>
               </text>
               <graphic file="gb-2007-8-8-r157-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>The prognostic value of the IR module is specific to ER<sup>- </sup>tumours</p>
            </st>
            <p>To confirm that the good prognosis conferred by activation of the IR module is specific to ER<sup>- </sup>breast cancer, we applied the same pam clustering algorithm over the seven genes to the integrated dataset of 527 ER<sup>+ </sup>breast tumors. This gave two clusters with unequal distributions of good and poor outcome samples (209 good and 99 poor for the cluster under-expressing the genes versus 163 good and 47 poor for the cluster over-expressing the genes; <it>P </it>= 0.02). Although this suggested to us that over-expression of this seven-gene module also conferred better prognosis in ER<sup>+ </sup>samples, the association was much weaker than for ER<sup>- </sup>samples. Univariate Cox regression with TTDM as the outcome variable confirmed that under-expression of this seven-gene module conferred a much greater risk for distant metastasis in ER<sup>- </sup>tumors (HR 2.02, 95% CI 1.2 to 3.4; <it>P </it>= 0.009) than in ER<sup>+ </sup>tumors (HR 1.25, 95% CI 0.9 to 1.7; <it>P </it>= 0.16; Table <tblr tid="T4">4</tblr>). It is also noteworthy that, in contrast to the ER<sup>- </sup>case, in the multivariate model setting for ER<sup>+ </sup>tumors, a low LI score and LN involvement were stronger predictors of TTDM than the seven-gene module (HR 1.65, 95% CI 0.4 to 7.1 for LI score and HR 1.48, 95% CI 0.7 to 3.2 for LN involvement, versus HR &lt;1; Table <tblr tid="T4">4</tblr>). The specificity of our prognostic module to ER<sup>- </sup>breast cancer was confirmed by application of PAC, which showed that, with the exception of <it>XCL2</it>, none of the other six genes were individually prognostic.</p>
            <p>In this context, it is worth noting once again the absence of immune response related genes among the 29 PAC derived prognostic genes in ER<sup>+ </sup>disease, which would suggest that a good prognosis IR related subtype is absent in ER<sup>+ </sup>breast cancer. To investigate this further we checked, by performing Wilcoxon rank sum tests on the 5,007 genes, that the absence of immune response GOs was not just an artefact of the small number of genes picked out by PACK. In fact, GO analysis (using GOTree Machine) of the top 500 genes obtained from this rank sum test (all with q-values &lt; 0.05) showed that cell cycle and transcription regulator activity related GOs were the only categories with highly significant <it>P </it>values (uncorrected <it>P </it>&lt; 10<sup>-6</sup>), and that the only enriched immune response related GO was that of humoral immune response, with 11 represented genes (including <it>BLM</it>, <it>FADD</it>, <it>C3</it>, <it>C7</it>, <it>BCL2</it>, <it>NFKB1</it>, and the IR module member <it>TNFRSF17</it>), and which was only marginally enriched (uncorrected <it>P </it>= 0.005). Although we verified that this humoral immune response module was associated with prognosis in the integrated ER<sup>+ </sup>cohort independent of LN status and LI (HR 2.26, 95% CI 0.97 to 5.26; <it>P </it>= 0.06) we were unable to validate this prognostic module in the external UPP and JRH-2 cohorts (Additional data file 8). Moreover, the co-regulation patterns for the genes in this module were less coherent than those for the IR module in ER<sup>- </sup>breast cancer (Figure <figr fid="F5">5</figr> and Additional data file 8). Hence, independent of the methodology used, an IR related prognostic module in ER<sup>+ </sup>breast cancer could not be identified, which seems to suggest that a good prognosis subtype related to IR is specific to ER<sup>- </sup>disease.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>A striking difference between ER<sup>+ </sup>and ER<sup>- </sup>disease is emerging at the level of mRNA expression. Although in ER<sup>+ </sup>disease a significant number of genes have been found that correlate with clinical outcome <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B10">10</abbr><abbr bid="B18">18</abbr><abbr bid="B22">22</abbr></abbrgrp>, in ER<sup>- </sup>disease no such prognostic signatures have thus far been reported. Moreover, although in ER<sup>+ </sup>tumors subtypes of different prognostic risks, the luminal A and B subtypes, have been defined <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>, no such subdivisions have been noted for ER<sup>- </sup>breast cancer. It is known that the two main subtypes of ER<sup>- </sup>breast cancer (ER<sup>-</sup>/HER2<sup>+ </sup>and basals) have worse prognosis compared with the luminal A subtype, but no outcome differences between the ER<sup>-</sup>/HER2<sup>+ </sup>and basal subtypes have been observed <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B26">26</abbr></abbrgrp>.</p>
         <p>We believe that these differences between ER<sup>+ </sup>and ER<sup>- </sup>disease are related to the different histopathologic characteristics of the tumors. The prognostic signatures derived for ER<sup>+ </sup>breast cancer are characterized by genes related to cell cycle and cell proliferation pathways, and are also highly correlated with the histologic grade of the tumors <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B22">22</abbr></abbrgrp>. It is not a coincidence that most luminal B tumours are of high grade, whereas the great majority of luminal A tumours are of low grade <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B22">22</abbr></abbrgrp>. It appears that there may be a whole plethora of diverse oncogenic pathways that drive the over-activation of cell cycle and cell growth pathways in poor prognosis tumors. This would explain the larger number of prognostic genes found in ER<sup>+ </sup>disease (the great majority of which are related to cell cycle ontologies) as well as the stronger prognostic signals (relatively large differences in log2 expression between poor and good prognosis tumours), an effect that is probably driven by oncogenic amplifications. This interpretation would also fit in well with our finding that most bimodal profiles in ER<sup>+ </sup>breast cancer have positive kurtosis values, because this could be a reflection of a more diverse range of small amplifier subgroups in ER<sup>+ </sup>breast cancer.</p>
         <p>In contrast, most ER<sup>- </sup>tumours are of high grade, which would explain why any differences in clinical outcome within ER<sup>- </sup>disease are not related to differential activation of cell cycle pathways. Instead, the work presented here shows that differences in clinical outcome within ER<sup>- </sup>disease are mainly related to differentially expressed genes in the complement and immune response pathways, and that the association with prognosis can be independent of lymphocytic infiltration (LI) and LN status. In fact, for ER<sup>- </sup>tumors we observed that even though there were proportionally more high LI samples in the group over-expressing the IR module, these did not necessarily have better prognosis. The fact that LI could not explain the observed association of the IR module with outcome was supported further by our finding that medullary breast cancers, which are characterized by high LI scores, had expression profiles most similar to the CC<sup>+</sup>/IR<sup>+ </sup>subtype rather than the IR<sup>+ </sup>subtype, which had the best prognosis overall. The better prognosis of the CC<sup>+</sup>/IR<sup>+ </sup>subtype relative to CC<sup>+ </sup>and SR<sup>+ </sup>ER<sup>- </sup>breast cancer is therefore entirely consistent with the CC<sup>+</sup>/IR<sup>+ </sup>subclass being medullary breast cancers (MBCs), as MBC is known to have marginally better prognosis than other basal tumors <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. On the other hand, the IR<sup>+ </sup>subclass, which had the best prognosis among the five ER<sup>- </sup>subclasses, was only marginally associated with high LI and was unrelated to MBC. Also consistent with these observations, it is important to note again the distinction between the identified seven-gene prognostic IR module and the 98-gene IR cluster that was derived from unsupervised hierarchical clustering. Clearly, we found a strong statistical association between high LI and over-expression of the 98-gene IR cluster. Specifically, there were 13 high LI and eight low LI samples in the combined IR<sup>+ </sup>and CC<sup>+</sup>/IR<sup>+ </sup>clusters relative to six high LI and 23 low LI samples in the rest of the cohort (Fisher test <it>P </it>= 0.007). In contrast, the association between high LI and over-expression of the IR module was much weaker (<it>P </it>= 0.07). Again, this suggests that a significant number of genes in the 98-gene IR cluster show expression variability that is not explained by LI. This is confirmed further by two recent studies that profiled breast cancer cell lines <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B38">38</abbr></abbrgrp>, which showed that a considerable number of immune response related genes do exhibit significant variable expression across the basal cell subtype. Moreover, we found that two (<it>SPP1 </it>and <it>HLA-F</it>) of the three IR module genes that we could map to good quality probes in <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B38">38</abbr></abbrgrp> showed twofold changes across the eight basal cell lines.</p>
         <p>Thus, these findings together suggest that a significant proportion of the expression of the IR module genes in the good prognosis tumors is tumor-intrinsic in origin. That tumor-intrinsic expression of IR genes can have an impact on prognosis of breast cancer patients is plausible in view of recent studies that show, for example, how amplification of kinase oncogenes can activate the nuclear factor-<it>&#954;</it>B pathway, and hence immune response pathways, in both breast cancer cell lines and patient derived breast tumors <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Similarly, another recent study <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> used breast cancer cell lines to show how BRCA1/IFN-<it>&#947; </it>pathways may regulate target genes involved in innate immune response, providing another possible mechanism for tumor intrinsic IR gene expression variability.</p>
         <p>In spite of identifying only a relatively small module of prognostic genes, we were nevertheless able to validate their prognostic potential in two external cohorts. It is likely that an integrative analysis similar to the one used here but applied to multiple cohorts that were all profiled on exactly the same genome-wide platform would allow further expansion of this module to include other members of the complement and immune response pathways. Interestingly, from the seven prognostic markers that composed the IR module, two have already been associated with clinical outcome in breast cancer. Specifically, <it>C1QA</it>, which is a gene involved in the classical complement pathway, was recently shown to harbor a single nucleotide polymorphism that correlated with distant metastasis in breast cancer <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Two recent studies also implicated <it>SPP1 </it>(osteopontin) in metastatic breast cancer <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>.</p>
         <p>It is important also to note that the robustness of the identified prognostic markers is a consequence of the PACK methodology. Despite being a conservative procedure that filters out many true positives, PACK allows, by efficient removal of false positives, a more reliable identification of prognostic markers. We tested this further by applying two popular statistical tools, singular value decomposition (SVD) <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and the shrunken centroids classifier (PAMR) <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, to the integrated ER<sup>- </sup>dataset to determine whether we could derive a similar if not identical prognostic IR module. Using SVD we found that none of the inferred SVD components showed a correlation with prognosis (Wilcoxon rank sum test <it>P </it>> 0.05), whereas at a FDR threshold of 0.3 PAMR yielded 21 prognostic genes, of which only two (the IR module member <it>TNFRSF17 </it>and <it>KLRD1</it>) had immune response related functions. Thus, in agreement with findings presented previously <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, this reinforces the advantage of PACK over other pattern recognition tools and supervised methods that do not use pattern recognition steps, such as for example those based on <it>t</it>-tests.</p>
         <p>The absence of a prognostic IR module in ER<sup>+ </sup>breast cancer is intriguing. The seven-gene IR module was only marginally associated with prognosis in ER<sup>+ </sup>disease, and importantly this association was not independent of LI or LN status. Repeating the same unsupervised analysis (PAK) and semi-supervised analysis (PACK) in ER<sup>+ </sup>breast cancer also did not find a prognostic immune response module. By using a traditional supervised method, a prognostic IR module was identifiable but failed to validate in the two external cohorts. Thus, to determine fully whether such a robust prognostic IR module exists for ER<sup>+ </sup>breast cancer, it may be necessary to conduct larger integrative studies that use the same microarray platform so that the analysis can be performed over a larger set of common genes.</p>
         <p>Besides identifying a module of genes that is prognostic in over 240 ER<sup>- </sup>breast tumors, PACK also provided us with a novel subclassification of ER<sup>- </sup>breast cancer. Specifically, clustering over PACK selected genes identified five different subtypes (CC<sup>+</sup>, CC<sup>+</sup>/IR<sup>+</sup>, IR<sup>+</sup>, ECM<sup>+</sup>, and SR<sup>+</sup>) characterized by the over-expression patterns of four distinct gene clusters, each enriched for IR, ECM, CC, and SR genes, respectively. Moreover, we related these subtypes to the gene expression based intrinsic subclasses. This showed that the basal subgroup was a heterogeneous group with at least four distinct subtypes (CC<sup>+</sup>, CC<sup>+</sup>/IR<sup>+</sup>, ECM<sup>+</sup>, and IR<sup>+</sup>), whereas the ER<sup>-</sup>/HER2<sup>+ </sup>subgroup showed strong overlap with the SR<sup>+ </sup>and IR<sup>+ </sup>subtypes.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>While in ER<sup>+ </sup>breast cancer prognostic markers are associated mainly with cell cycle pathways, in ER<sup>- </sup>disease prognostic markers are associated with immune response pathways. In particular, we have identified a subclass of ER<sup>- </sup>tumors that over-express immune response genes and that has a good prognosis compared with the rest of ER<sup>- </sup>breast tumors, independently of LN status or LI. Furthermore, we have identified an associated module of complement and immune response genes that define prognostic markers valid in over 240 ER<sup>- </sup>samples.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets and gene annotation</p>
            </st>
            <p>The microarray breast cancer datasets considered in this work are described elsewhere <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. For these cohorts we used the normalized data, which are available in the public domain (see <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>). The retrieved datasets were further normalized, if necessary, by transforming them onto a common log2 scale and shifting the median of each array to zero. We also created an automated computational pipeline (Perl scripts on a Linux platform) to crosslink the annotation provided for each dataset with UniGene. For some datasets, the linkage relied on Ensembl <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> external database identifiers. Thus each probe was associated with a universal gene name. This procedure generated a nonredundant set of gene identifiers for the subsequent integrative analysis.</p>
         </sec>
         <sec>
            <st>
               <p>PACK: profile analysis using clustering and kurtosis</p>
            </st>
            <p>The hypothesis underlying PACK <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> is that genes that are true biologic or clinical markers have expression profiles that are generated by a mixture of two or more underlying distributions, whereas spurious features are more likely to have profiles generated by a single distribution. The biologic validity of this assumption was proved through a FDR analysis <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
            <p>PACK can be viewed as a semi-supervised algorithm, consisting of two main steps: a feature selection criterion and a supervised step, in which the selected features are correlated to a phenotype (Figure <figr fid="F2">2</figr>). It is important to note that PACK is a flexible modular algorithm in that the feature selection step can be applied on its own. In this case, there are two possible versions of the algorithm: PAC and PAK. The precise way in which these two algorithms are used in PACK will depend on the purpose of the exercise. Below, we describe the PACK strategy implemented in this paper, which is slightly different from that applied previously <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Feature selection with PAK: using negative kurtosis to find genes defining major subclasses</p>
            </st>
            <p>Kurtosis is related to the fourth central moment and can conveniently be defined as follows <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>:</p>
            <p>
               <display-formula id="M1">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-8-r157-i1">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>K</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>X</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8801;</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>E</m:mi>
                                 <m:mo stretchy="false">[</m:mo>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mover accent="true">
                                          <m:mi>X</m:mi>
                                          <m:mo>&#175;</m:mo>
                                       </m:mover>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mn>4</m:mn>
                                 </m:msup>
                                 <m:mo stretchy="false">]</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>E</m:mi>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mover accent="true">
                                                <m:mi>X</m:mi>
                                                <m:mo>&#175;</m:mo>
                                             </m:mover>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                       <m:mo stretchy="false">]</m:mo>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>&#8722;</m:mo>
                           <m:mn>3</m:mn>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaacaWGlbGaaiikaiaadIfacaGGPaGaeyyyIO7aaSaaaeaaieaacaWFfbGaai4waiaacIcacaWFybGaeyOeI0Iab8hwayaaraGaaiykamaaCaaaleqabaGaaGinaaaakiaac2faaeaacaWFfbGaai4waiaacIcacaWFybGaeyOeI0Iab8hwayaaraGaaiykamaaCaaaleqabaGaaGOmaaaakiaac2fadaahaaWcbeqaaiaaikdaaaaaaOGaeyOeI0IaaG4maaaa@49C6@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where X is any random variable and E denotes the expectation. For a gaussian <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-8-r157-i2"><m:semantics><m:mrow><m:mi>E</m:mi><m:mo stretchy="false">[</m:mo><m:msup><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>X</m:mi><m:mo>&#8722;</m:mo><m:mover accent="true"><m:mi>X</m:mi><m:mo>&#175;</m:mo></m:mover><m:mo stretchy="false">)</m:mo></m:mrow><m:mn>4</m:mn></m:msup><m:mo stretchy="false">]</m:mo><m:mo>=</m:mo><m:mn>3</m:mn><m:mi>E</m:mi><m:msup><m:mrow><m:mo stretchy="false">[</m:mo><m:msup><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>X</m:mi><m:mo>&#8722;</m:mo><m:mover accent="true"><m:mi>X</m:mi><m:mo>&#175;</m:mo></m:mover><m:mo stretchy="false">)</m:mo></m:mrow><m:mn>2</m:mn></m:msup><m:mo stretchy="false">]</m:mo></m:mrow><m:mn>2</m:mn></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaacaWFfbGaai4waiaacIcacaWFybGaeyOeI0Iab8hwayaaraGaaiykamaaCaaaleqabaGaaGinaaaakiaac2facqGH9aqpcaaIZaGaa8xraiaacUfacaGGOaGaa8hwaiabgkHiTiqa=HfagaqeaiaacMcadaahaaWcbeqaaiaaikdaaaGccaGGDbWaaWbaaSqabeaacaaIYaaaaaaa@44F6@</m:annotation></m:semantics></m:math></inline-formula>, so that K(X) = 0. Most nongaussian distributions necessarily have either K > 0, in which case they are called supergaussian or leptokurtic, or K &lt; 0, in which case they are called subgaussian or platykurtic. Specifically, a mixture of two approximately equal mass gaussians must have negative kurtosis because the two modes on either side of the center of mass effectively flatten out the distribution. To see this, consider a gene whose expression profile is described by a mixture of two gaussians. Then, the kurtosis, K, is a function of two parameters (we assume for simplicity that the gaussians are of equal variance <it>&#963;</it><sup>2</sup>, although this assumption is not needed for the result below); the effect size of the gene, as defined by the effective separation e between the two gaussians (e = <it>&#956;</it>/<it>&#963;</it>, where <it>&#956; </it>is the separation), and the ratio of cluster weights (<it>&#960;</it><sub>1</sub>, <it>&#960;</it><sub>2</sub>), that is R = <it>&#960;</it><sub>1</sub>/(1 - <it>&#960;</it><sub>1</sub>). Specifically, a short computation reveals that</p>
            <p>
               <display-formula id="M2">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-8-r157-i3">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>K</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>e</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>R</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:msup>
                              <m:mi>e</m:mi>
                              <m:mn>4</m:mn>
                           </m:msup>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>R</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>R</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>a</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>R</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>b</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>+</m:mo>
                                       <m:mi>R</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mn>4</m:mn>
                                 </m:msup>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>+</m:mo>
                                       <m:mfrac>
                                          <m:mi>R</m:mi>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>+</m:mo>
                                                   <m:mi>R</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mn>2</m:mn>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:msup>
                                          <m:mi>e</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaacaWGlbGaaiikaiaadwgacaGGSaGaamOuaiaacMcacqGH9aqpcaWGLbWaaWbaaSqabeaacaaI0aaaaOWaaSaaaeaacaWGsbGaaiikaiaadkfacqGHsislcaWGHbGaaiykaiaacIcacaWGsbGaeyOeI0IaamOyaiaacMcaaeaacaGGOaGaaGymaiabgUcaRiaadkfacaGGPaWaaWbaaSqabeaacaaI0aaaaOGaaiikaiaaigdacqGHRaWkdaWcaaqaaiaadkfaaeaacaGGOaGaaGymaiabgUcaRiaadkfacaGGPaWaaWbaaSqabeaacaaIYaaaaaaakiaadwgadaahaaWcbeqaaiaaikdaaaGccaGGPaWaaWbaaSqabeaacaaIYaaaaaaaaaa@53BF@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where a and b are the quadratic roots 2 &#177; &#8730;3. Thus, for e &#8800; 0, the kurtosis is negative if and only if (2 - &#8730;3) &lt; R &lt; (2 + &#8730;3). This in turn requires the smallest cluster weight, <it>&#960;</it><sub>min</sub>, to be in the range (approximately) of 0.22 &lt;<it>&#960;</it><sub>min </sub>&lt; 0.5. It follows that for the case of approximately equal weights, where R &#8773; 1 (<it>&#960;</it><sub>min </sub>&#8773; 0.5), the kurtosis is always negative and in the limit of large cluster separations (when e >> 1) the kurtosis decreases monotonically, asymptotically approaching the lower bound -2. Thus, kurtosis provides a useful measure for ranking genes based on how platykurtic their profiles are.</p>
            <p>Given a gene's expression profile x = (x<sub>1</sub>, ..., x<sub>n</sub>), an unbiased estimate for the kurtosis <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> is as follows:</p>
            <p>
               <display-formula id="M3">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-8-r157-i4">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>K</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>x</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8793;</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>n</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mstyle displaystyle="true">
                                    <m:msubsup>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mi>n</m:mi>
                                    </m:msubsup>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>x</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mover accent="true">
                                                <m:mi>x</m:mi>
                                                <m:mo>&#175;</m:mo>
                                             </m:mover>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mn>4</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>2</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>3</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:msup>
                                    <m:mi>&#963;</m:mi>
                                    <m:mn>4</m:mn>
                                 </m:msup>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>&#8722;</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mn>3</m:mn>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                              </m:mrow>
                              <m:mrow>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>2</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>3</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaacaWGlbGaaiikaiaadIhacaGGPaGaeSywIe0aaSaaaeaacaWGUbGaaiikaiaad6gacqGHRaWkcaaIXaGaaiykamaaqadabaGaaiikaiaadIhadaWgaaWcbaGaamyAaaqabaGccqGHsislceWG4bGbaebacaGGPaWaaWbaaSqabeaacaaI0aaaaaqaaiaadMgacqGH9aqpcaaIXaaabaGaamOBaaqdcqGHris5aaGcbaGaaiikaiaad6gacqGHsislcaaIXaGaaiykaiaacIcacaWGUbGaeyOeI0IaaGOmaiaacMcacaGGOaGaamOBaiabgkHiTiaaiodacaGGPaGaeq4Wdm3aaWbaaSqabeaacaaI0aaaaaaakiabgkHiTmaalaaabaGaaG4maiaacIcacaWGUbGaeyOeI0IaaGymaiaacMcadaahaaWcbeqaaiaaikdaaaaakeaacaGGOaGaamOBaiabgkHiTiaaikdacaGGPaGaaiikaiaad6gacqGHsislcaaIZaGaaiykaaaaaaa@65B9@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-8-r157-i5"><m:semantics><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaceWG4bGbaebaaaa@3442@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963; </it>are the mean and standard deviation estimates of the profile. A standard error estimate of K was obtained by performing 10,000 random simulations, with <it>n </it>= 186 (number of ER<sup>- </sup>samples), which showed that the standard error estimate, 0.36, was essentially identical to the theoretical estimate, &#8730;(24/<it>n</it>) <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>.</p>
            <p>Two notes with the feature selection step are in order. First, the kurtosis threshold used to select features depends on how large the smallest subgroup must be. Generally, given the effective separation values that are typical for differential gene expression, we find that a zero kurtosis threshold (as used in this report) generally picks out subgroups within the individual gene expression profiles that are at least as large as 30% of the total sample size <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Second, in principle, genes defining major subclasses could be found using a clustering step to infer two clusters (PAC) and setting a lower bound threshold (for instance, 30%) on the size of the smallest cluster. However, this approach is computationally more expensive, because PAC attempts to estimate the optimal number of clusters in the profile. However, this model selection step is a necessary one to ensure that profiles for which there is no objective evidence of bimodality are excluded (see below).</p>
         </sec>
         <sec>
            <st>
               <p>PAC: identification of robust prognostic markers</p>
            </st>
            <p>Having selected the genes defining the largest subclasses, we next apply PAC to each of these genes to remove those for which there is no evidence of bimodality (gaussian profiles that spuriously have negative kurtosis values). Specifically, given a gene's expression profile x = (x<sub>1</sub>, ..., x<sub>n</sub>), we model this as a random sample of a univariate random variable X, whose density function is possibly a mixture of Gaussians:</p>
            <p>
               <display-formula id="M4">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-8-r157-i6">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>x</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>|</m:mo>
                           <m:mi>&#952;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>C</m:mi>
                                       <m:mi>M</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:munderover>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>&#960;</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mi>G</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>x</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>&#956;</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo>,</m:mo>
                                 <m:msub>
                                    <m:mi>&#963;</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaacaWGWbGaaiikaiaadIhadaWgaaWcbaGaamyAaaqabaGccaGG8bGaeqiUdeNaaiykaiabg2da9maaqahabaGaeqiWda3aaSbaaSqaaiaadUgaaeqaaOGaam4raiaacIcacaWG4bWaaSbaaSqaaiaadMgaaeqaaOGaaiiFaiabeY7aTnaaBaaaleaacaWGRbaabeaakiaacYcacqaHdpWCdaWgaaWcbaGaam4AaaqabaGccaGGPaaaleaacaWGRbGaeyypa0JaaGymaaqaaiaadoeadaWgaaadbaGaamytaaqabaaaniabggHiLdaaaa@50B0@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Where <it>&#960;</it><sub>k </sub>are the weights of the components, (<it>&#956;</it><sub>k</sub>, <it>&#963;</it><sub>k</sub>) are the mean and standard deviation of the univariate gaussian k, and <it>&#952; </it>denotes the set of all parameters. In the above, C<sub>M </sub>denotes the maximum number of clusters that can be inferred, which in our application we set to 2. The optimal number of clusters, C, can be inferred using one of various approaches. One possibility is to use the EM algorithm to learn the parameters for the two different models C = 1 and C = 2, and perform model selection using the Bayesian Information Criterion (BIC) score <abbrgrp><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>. Alternatively, the optimal number of clusters, C, can be inferred using a lower bound on the model evidence, as provided by a variational Bayesian (VB) approach <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>. The results we report here were obtained using the VB algorithm for model selection. Thus, genes for which C = 1 were excluded from further analysis. Finally, association with the phenotype (here prognosis) was determined using Fisher's exact test to test whether poor outcome events were unevenly distributed across the two clusters.</p>
         </sec>
         <sec>
            <st>
               <p>Software packages used</p>
            </st>
            <p>All analyses were performed using the R statistical programming language <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. The following add-on packages were used: vabayelMix for the PACK implementation, survival for the Cox regression models, qvalue for FDR estimation, and cluster for the partitioning around medoids (pam) clustering algorithm.</p>
         </sec>
         <sec>
            <st>
               <p>The SSP classifier</p>
            </st>
            <p>The classification of the samples in the NKI2, EMC, and NCH cohorts into the intrinsic subtypes was performed using the single sample predictor (SSP) <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and was done for each cohort separately because this guaranteed a larger number of overlapping genes. In the SSP, samples were assigned the intrinsic subtype for which the corresponding Spearman rank correlation between the sample and SSP centroid was maximal <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>The ER<sup>- </sup>subclass centroids</p>
            </st>
            <p>From the hierarchical clustering with Pearson correlation metric and complete linkage diagram (Figure <figr fid="F3">3a</figr>) we constructed mean centroids for each of the five subclasses. Classification of external samples to these centroids was performed using the nearest centroid criterion. Because these centroids were defined over ER<sup>- </sup>samples only, external samples (which may not be ER<sup>-</sup>) may not show strong correlation to any of these centroids. We thus validated, through 10,000 Monte Carlo (MC) randomisations, that samples with a maximal pearson coefficient larger than 0.25 were significantly correlated with the corresponding centroid (<it>P </it>&lt; 0.0001). Samples with maximal correlation coefficients smaller than 0.25 were deemed to be unclassifiable.</p>
         </sec>
         <sec>
            <st>
               <p>Expression based basal and HER2<sup>+ </sup>markers</p>
            </st>
            <p>The basal marker used in Figure <figr fid="F3">3</figr> was derived by first mapping ten validated basal markers (<it>CRYAB</it>, <it>ANXA8</it>, <it>LAMC2</it>, <it>LAMB3</it>, <it>ITGA6</it>, <it>KRT17</it>, <it>KRT15</it>, <it>KRT13</it>, <it>KRT6B</it>, and <it>KRT5</it>) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> onto the integrated data set of 5,007 genes. For each of these markers samples were ranked in order of decreasing expression or 'basalness'. For each sample in the integrated cohort an average rank was then computed over the ten basal markers. The average ranks were then rescaled onto the unit interval (0,1), with '1' indicating highest expression for basal markers. The marker for the <it>ERBB2 </it>subtype was obtained in an analogous manner using three genes in the <it>ERBB2 </it>amplicon (<it>ERBB2</it>, <it>GRB7</it>, and <it>STARD3</it>).</p>
         </sec>
         <sec>
            <st>
               <p>Lymphocyte infiltration scores</p>
            </st>
            <p>For the samples from our NCH cohort <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> we used the following scoring method. Lymphocytic infiltration (LI) was assessed in whole tumour sections from frozen sections stained with hematoxylin and eosin. The intensity of lymphocytic infiltrate was first graded semi-quantitatively as minimal or mild (1), moderate (2), and marked (3). The LI scores were then dichotomized (we considered mild and moderate as low LI and marked as high LI) to make them comparable with the binary LI scores used by van 't Veer and coworkers <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this manuscript. Additional data file <supplr sid="S1">1</supplr> is a table showing the 813 genes with negative kurtosis expression profiles over 186 ER- tumors, together with the predicted number of clusters and Fisher's test P value with outcome as binary phenotype. Additional data file <supplr sid="S2">2</supplr> is a table showing the 193 genes with negative kurtosis expression profiles over 527 ER+ samples, together with the predicted number of clusters and Fisher's test P value with outcome as binary phenotype. Additional data file <supplr sid="S3">3</supplr> is a figure showing hierarchical clustering over 186 ER<sup>- </sup>breast cancers (gene annotated version). Additional data file <supplr sid="S4">4</supplr> is a figure showing the distribution of basal and ERBB2 markers among ER<sup>- </sup>subtypes. Additional data file <supplr sid="S5">5</supplr> is a table showing the centroids of gene expression for each of the five identified ER- subtypes. Additional data file <supplr sid="S6">6</supplr> is a figure showing expression profiles of immune response module genes in ER<sup>- </sup>samples of the external UPP cohort. Additional data file <supplr sid="S7">7</supplr> is a figure showing expression profiles of immune response module genes in ER<sup>- </sup>samples of the external JRH-2 cohort. Additional data file <supplr sid="S8">8</supplr> is a figure showing the clustering of ER+ samples over the humoral immune response gene module in the two external UPP and JRH-2 cohorts.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Genes with negative kurtosis expression profiles in ER- breast cancer</p>
            </caption>
            <text>
               <p>Columns label the gene, the negative kurtosis of its expression profile over 186 ER- samples, the number of clusters predicted by PAC and Fisher's test P value testing for an association between outcome and the two clusters.</p>
            </text>
            <file name="gb-2007-8-8-r157-S1.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Genes with negative kurtosis expression profiles in ER+ breast cancer</p>
            </caption>
            <text>
               <p>Columns label the gene, the negative kurtosis of its expression profile over 527 ER+ samples, the number of clusters predicted by PAC and Fisher's test P value testing for an association between outcome and the two clusters.</p>
            </text>
            <file name="gb-2007-8-8-r157-S2.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Hierarchical clustering over 186 ER- breast cancers: gene annotated version</p>
            </caption>
            <text>
               <p>Hierarchical clustering over 186 ER<sup>- </sup>breast cancers and 813 negative kurtosis profile genes selected using the PAK algorithm, as explained in the text. Five main clusters were identified and characterized in terms of over-expression of genes related to cell cycle (CC), immune response (IR), extracellular matrix (ECM), and steroid hormone response (SR) functions. Red denotes relative over-expression and green relative under-expression.</p>
            </text>
            <file name="gb-2007-8-8-r157-S3.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Distribution of basal and ERBB2 markers among ER<sup>- </sup>subtypes</p>
            </caption>
            <text>
               <p><b>(A) </b>Hierarchical clustering dendrogram with the different ER<sup>- </sup>subtypes as defined by the clustering in Figure <figr fid="F2">2a</figr>. <b>(B) </b>The distribution of lymphocytic infiltration scores (LI) and histologic grade. Color codes: black = high LI and high grade; gray = low LI; blue = intermediate grade; and sky blue = low grade. (<b>C) </b>Expression profiles of validated basal markers from <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> across ER<sup>- </sup>subtypes. <b>(D) </b>Expression profiles of genes in the <it>ERBB2 </it>amplicon. Color codes: green = relative under-expression; red = relative over-expression.</p>
            </text>
            <file name="gb-2007-8-8-r157-S4.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Gene expression centroids for ER- subclasses</p>
            </caption>
            <text>
               <p>Table gives the gene expression centroids over the five identified ER- subclasses. Centroids were defined over the 813 genes with negative kurtosis expression profiles.</p>
            </text>
            <file name="gb-2007-8-8-r157-S5.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>Expression profiles of immune response module genes in ER<sup>- </sup>samples of external UPP cohort</p>
            </caption>
            <text>
               <p>Expression profiles (on a log2 scale) of immune response module genes in the validation ER<sup>- </sup>cohort UPP. Black indicates good outcome samples and red poor outcome samples. Clusters were inferred using the pam algorithm. Inferred clusters are indicated by different shapes (triangles and diamonds).</p>
            </text>
            <file name="gb-2007-8-8-r157-S6.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>Expression profiles of immune response module genes in ER<sup>- </sup>samples of external JRH-2 cohort</p>
            </caption>
            <text>
               <p>Expression profiles (on a log2 scale) of immune response module genes in the validation ER<sup>- </sup>cohort UPP. Black indicates good outcome samples and red poor outcome samples. Clusters were inferred using the pam algorithm. Inferred clusters are indicated by different shapes (triangles and diamonds).</p>
            </text>
            <file name="gb-2007-8-8-r157-S7.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional data file 8</p>
            </title>
            <caption>
               <p>Humoral immune response module in external ER<sup>+ </sup>cohorts</p>
            </caption>
            <text>
               <p>Heatmap of gene expression of the 11-gene humoral IR module in the ER<sup>+ </sup>samples of the <b>(A) </b>UPP and <b>(B) </b>JRH-2 cohorts. Shown are the clusters over-expressing (purple) and underexpressing (yellow) the humoral IR module as predicted by the pam algorithm. Good outcome samples are presented in gray and poor outcome samples in black. Green indicates relative under-expression, and red relative over-expression. <b>(C) </b>Kaplan-Meier survival curves over combined external cohorts (for UPP the end-point was disease-specific survival, and for JRH-2 it was recurrence-free survival), with the number of events and samples in each of the two predicted groups.</p>
            </text>
            <file name="gb-2007-8-8-r157-S8.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This research was supported by grants from Cancer Research UK. We thank Ali Naderi for useful discussions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Molecular classification and molecular forecasting of breast cancer: ready for clinical application?</p>
            </title>
            <aug>
               <au>
                  <snm>Brenton</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Carey</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Ahmed</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Caldas</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Clin Oncol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>7350</fpage>
            <lpage>7360</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1200/JCO.2005.03.3845</pubid>
                  <pubid idtype="pmpid" link="fulltext">16145060</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Breast carcinoma with basal differentiation: a proposal for pathology definition based on basal cytokeratin expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Rakha</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>El-Sayed</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Paish</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>IO</fnm>
               </au>
            </aug>
            <source>Histopathology</source>
            <pubdate>2007</pubdate>
            <volume>50</volume>
            <fpage>434</fpage>
            <lpage>438</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2559.2007.02638.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">17448018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Gene expression profiling predicts clinical outcome of breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>van 't Veer</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>van de Vijver</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Mao</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Peterse</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>van der Kooy</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Marton</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Witteveen</snm>
                  <fnm>AT</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>530</fpage>
            <lpage>536</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415530a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11823860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Paik</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cronin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baehner</snm>
                  <fnm>FL</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Watson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>N Engl J Med</source>
            <pubdate>2004</pubdate>
            <volume>351</volume>
            <fpage>2817</fpage>
            <lpage>2826</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1056/NEJMoa041588</pubid>
                  <pubid idtype="pmpid" link="fulltext">15591335</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Klijn</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sieuwerts</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Look</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Talantov</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Timmermans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meijer-van Gelder</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Lancet</source>
            <pubdate>2005</pubdate>
            <volume>365</volume>
            <fpage>671</fpage>
            <lpage>679</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15721472</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts.</p>
            </title>
            <aug>
               <au>
                  <snm>Pawitan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bjohle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Amler</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Borg</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Egyhazi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Holmberg</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Klaar</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Breast Cancer Res</source>
            <pubdate>2005</pubdate>
            <volume>7</volume>
            <fpage>R953</fpage>
            <lpage>R964</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1410752</pubid>
                  <pubid idtype="pmpid" link="fulltext">16280042</pubid>
                  <pubid idtype="doi">10.1186/bcr1325</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis.</p>
            </title>
            <aug>
               <au>
                  <snm>Sotiriou</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wirapati</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Loi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fox</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smeds</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nordgren</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Farmer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Praz</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Haibe-Kains</snm>
                  <fnm>B</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Natl Cancer Inst</source>
            <pubdate>2006</pubdate>
            <volume>98</volume>
            <fpage>262</fpage>
            <lpage>272</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16478745</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Multicenter validation of a gene expression-based prognostic signature in lymph node-negative primary breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Foekens</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Atkins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sweep</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Harbeck</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Paradiso</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cufer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sieuwerts</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Talantov</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Span</snm>
                  <fnm>PN</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Clin Oncol</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <fpage>1665</fpage>
            <lpage>1671</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1200/JCO.2005.03.9115</pubid>
                  <pubid idtype="pmpid" link="fulltext">16505412</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A gene-expression signature to predict survival in breast cancer across independent data sets.</p>
            </title>
            <aug>
               <au>
                  <snm>Naderi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Teschendorff</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Barbosa-Morais</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Pinder</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Powe</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Aparicio</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>IO</fnm>
               </au>
               <au>
                  <snm>Brenton</snm>
                  <fnm>JD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Oncogene</source>
            <pubdate>2007</pubdate>
            <volume>26</volume>
            <fpage>1507</fpage>
            <lpage>1516</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1209920</pubid>
                  <pubid idtype="pmpid" link="fulltext">16936776</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A consensus prognostic gene expression classifier for ER positive breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Teschendorff</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Naderi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Barbosa-Morais</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Pinder</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>IO</fnm>
               </au>
               <au>
                  <snm>Aparicio</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brenton</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Caldas</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>R101</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1794561</pubid>
                  <pubid idtype="pmpid" link="fulltext">17076897</pubid>
                  <pubid idtype="doi">10.1186/gb-2006-7-10-r101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Expression of cytokeratins 17 and 5 identifies a group of breast carcinomas with poor clinical outcome.</p>
            </title>
            <aug>
               <au>
                  <snm>van de Rijn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kallioniemi</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Kononen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Torhorst</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sauter</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zuber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kochli</snm>
                  <fnm>OR</fnm>
               </au>
               <etal/>
            </aug>
            <source>Am J Pathol</source>
            <pubdate>2002</pubdate>
            <volume>161</volume>
            <fpage>1991</fpage>
            <lpage>1996</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1850928</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466114</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Biological and prognostic significance of stratified epithelial cytokeratins in infiltrating ductal breast carcinomas.</p>
            </title>
            <aug>
               <au>
                  <snm>Malzahn</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mitze</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thoenes</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moll</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Virchows Arch</source>
            <pubdate>1998</pubdate>
            <volume>433</volume>
            <fpage>119</fpage>
            <lpage>129</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s004280050226</pubid>
                  <pubid idtype="pmpid" link="fulltext">9737789</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Basal phenotype identifies a poor prognostic subgroup of breast cancer of clinical importance.</p>
            </title>
            <aug>
               <au>
                  <snm>Rakha</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>El-Rehim</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Paish</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Blamey</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Macmillan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>IO</fnm>
               </au>
            </aug>
            <source>Eur J Cancer</source>
            <pubdate>2006</pubdate>
            <volume>42</volume>
            <fpage>3149</fpage>
            <lpage>3156</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ejca.2006.08.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">17055256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Prognostic markers in triple-negative breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Rakha</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>El-Sayed</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>IO</fnm>
               </au>
            </aug>
            <source>Cancer</source>
            <pubdate>2007</pubdate>
            <volume>109</volume>
            <fpage>25</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/cncr.22381</pubid>
                  <pubid idtype="pmpid" link="fulltext">17146782</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Basal-like phenotype is not associated with patient survival in estrogen-receptor-negative breast cancers.</p>
            </title>
            <aug>
               <au>
                  <snm>Jumppanen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gruvberger-Saal</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kauraniemi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tanner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bendahl</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Lundin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kataja</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Borg</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ferno</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Breast Cancer Res</source>
            <pubdate>2007</pubdate>
            <volume>9</volume>
            <fpage>R16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1851391</pubid>
                  <pubid idtype="pmpid" link="fulltext">17263897</pubid>
                  <pubid idtype="doi">10.1186/bcr1649</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>'Good Old' clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers.</p>
            </title>
            <aug>
               <au>
                  <snm>Eden</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ritz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ferno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Eur J Cancer</source>
            <pubdate>2004</pubdate>
            <volume>40</volume>
            <fpage>1837</fpage>
            <lpage>1841</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ejca.2004.02.025</pubid>
                  <pubid idtype="pmpid" link="fulltext">15288284</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Teschendorff</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Naderi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Barbosa-Morais</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Caldas</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>2269</fpage>
            <lpage>2275</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl174</pubid>
                  <pubid idtype="pmpid" link="fulltext">16682424</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>A gene-expression signature as a predictor of survival in breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>van de Vijver</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>van't Veer</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Voskuil</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Peterse</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Marton</snm>
                  <fnm>MJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>N Engl J Med</source>
            <pubdate>2002</pubdate>
            <volume>347</volume>
            <fpage>1999</fpage>
            <lpage>2009</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1056/NEJMoa021967</pubid>
                  <pubid idtype="pmpid" link="fulltext">12490681</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Smeds</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>George</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vega</snm>
                  <fnm>VB</fnm>
               </au>
               <au>
                  <snm>Vergara</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ploner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pawitan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Klaar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Bergh</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>13550</fpage>
            <lpage>13555</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1197273</pubid>
                  <pubid idtype="pmpid" link="fulltext">16141321</pubid>
                  <pubid idtype="doi">10.1073/pnas.0506230102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Statistical significance for genomewide studies.</p>
            </title>
            <aug>
               <au>
                  <snm>Storey</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>9440</fpage>
            <lpage>9445</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">170937</pubid>
                  <pubid idtype="pmpid" link="fulltext">12883005</pubid>
                  <pubid idtype="doi">10.1073/pnas.1530509100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Repeated observation of breast tumor subtypes in independent gene expression data sets.</p>
            </title>
            <aug>
               <au>
                  <snm>Sorlie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Parker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Marron</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Nobel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Johnsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pesich</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Geisler</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>8418</fpage>
            <lpage>8423</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">166244</pubid>
                  <pubid idtype="pmpid" link="fulltext">12829800</pubid>
                  <pubid idtype="doi">10.1073/pnas.0932692100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Breast cancer classification and prognosis based on gene expression profiles from a population-based study.</p>
            </title>
            <aug>
               <au>
                  <snm>Sotiriou</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Neo</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>McShane</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Korn</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Jazaeri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Martiat</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fox</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>ET</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>10393</fpage>
            <lpage>10398</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193572</pubid>
                  <pubid idtype="pmpid" link="fulltext">12917485</pubid>
                  <pubid idtype="doi">10.1073/pnas.1732912100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The molecular portraits of breast tumors are conserved across microarray platforms.</p>
            </title>
            <aug>
               <au>
                  <snm>Hu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Fan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Oh</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Marron</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Qaqish</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Livasy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Carey</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Reynolds</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dressler</snm>
                  <fnm>L</fnm>
               </au>
               <etal/>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>96</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1468408</pubid>
                  <pubid idtype="pmpid" link="fulltext">16643655</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-96</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Identification of molecular apocrine breast tumours by microarray analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Farmer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bonnefoi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Becette</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Tubiana-Hulin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fumoleau</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Larsimont</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Macgrogan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bergh</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>Oncogene</source>
            <pubdate>2005</pubdate>
            <volume>24</volume>
            <fpage>4660</fpage>
            <lpage>4671</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1208561</pubid>
                  <pubid idtype="pmpid" link="fulltext">15897907</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>An estrogen receptor-negative breast cancer subset characterized by a hormonally regulated transcriptional program and response to androgen.</p>
            </title>
            <aug>
               <au>
                  <snm>Doane</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Danso</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Donaton</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hudis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gerald</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2006</pubdate>
            <volume>25</volume>
            <fpage>3994</fpage>
            <lpage>4008</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1209415</pubid>
                  <pubid idtype="pmpid" link="fulltext">16491124</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Molecular portraits of human breast tumours.</p>
            </title>
            <aug>
               <au>
                  <snm>Perou</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Sorlie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>van de Rijn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jeffrey</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Rees</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Pollack</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Johnsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Akslen</snm>
                  <fnm>LA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>747</fpage>
            <lpage>752</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35021093</pubid>
                  <pubid idtype="pmpid" link="fulltext">10963602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Gene expression profiling of breast cell lines identifies potential new basal markers.</p>
            </title>
            <aug>
               <au>
                  <snm>Charafe-Jauffret</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ginestier</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Monville</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Finetti</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Adelaide</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cervera</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Fekairi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Xerri</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Jacquemier</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Birnbaum</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bertucci</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2006</pubdate>
            <volume>25</volume>
            <fpage>2273</fpage>
            <lpage>2284</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1209254</pubid>
                  <pubid idtype="pmpid" link="fulltext">16288205</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Schmoyer</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kirov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Snoddy</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>1</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">317364</pubid>
                  <pubid idtype="pmpid" link="fulltext">14706121</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-1</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <aug>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rousseeuw</snm>
                  <fnm>PJ</fnm>
               </au>
            </aug>
            <source>Finding Groups in Data: An Introduction to Cluster Analysis</source>
            <publisher>New York: Wiley</publisher>
            <pubdate>1990</pubdate>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gene expression predictors of breast cancer outcomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Dressman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pittman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tsou</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Horng</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Bild</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Iversen</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Liao</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>CM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Lancet</source>
            <pubdate>2003</pubdate>
            <volume>361</volume>
            <fpage>1590</fpage>
            <lpage>1596</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0140-6736(03)13308-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12747878</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Different patterns of inflammation and prognosis in invasive carcinoma of the breast.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Gillett</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Ryder</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Fentiman</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Miles</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Millis</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Histopathology</source>
            <pubdate>2006</pubdate>
            <volume>48</volume>
            <fpage>692</fpage>
            <lpage>701</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2559.2006.02410.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16681685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Independent prognostic value of laminin receptor expression in breast cancer survival.</p>
            </title>
            <aug>
               <au>
                  <snm>Marques</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Franco</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Torloni</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Brentani</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>da Silva-Neto</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Brentani</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>1990</pubdate>
            <volume>50</volume>
            <fpage>1479</fpage>
            <lpage>1483</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2137368</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Relationship of patient age to pathologic features of the tumor and prognosis for patients with stage I or II breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Nixon</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Neuberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Gelman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Connolly</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Schnitt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Abner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Recht</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vicini</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>J Clin Oncol</source>
            <pubdate>1994</pubdate>
            <volume>12</volume>
            <fpage>888</fpage>
            <lpage>894</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8164038</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Prognostic significance of HER-2/neu expression in breast cancer and its relationship to other prognostic factors.</p>
            </title>
            <aug>
               <au>
                  <snm>Rilke</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Colnaghi</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Cascinelli</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Andreola</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Baldini</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Bufalino</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Della Porta</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Menard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pierotti</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Testori</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Int J Cancer</source>
            <pubdate>1991</pubdate>
            <volume>49</volume>
            <fpage>44</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/ijc.2910490109</pubid>
                  <pubid idtype="pmpid">1678734</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Lymphocyte infiltrates as a prognostic variable in female breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Aaltomaa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lipponen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Eskelinen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kosma</snm>
                  <fnm>VM</fnm>
               </au>
               <au>
                  <snm>Marin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Alhava</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Syrjanen</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Eur J Cancer</source>
            <pubdate>1992</pubdate>
            <volume>28A</volume>
            <fpage>859</fpage>
            <lpage>864</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0959-8049(92)90134-N</pubid>
                  <pubid idtype="pmpid">1524909</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Prognostic significance of the Ackerman classification and other histopathological characteristics in breast cancer. An analysis of 1,349 consecutive cases with complete follow-up over seven years.</p>
            </title>
            <aug>
               <au>
                  <snm>Holmberg</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Adami</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Lindgren</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ekbom</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sandstrom</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bergstrom</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>APMIS</source>
            <pubdate>1988</pubdate>
            <volume>96</volume>
            <fpage>979</fpage>
            <lpage>990</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3196478</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Prognostic significance of necrosis, elastosis, fibrosis and inflammatory cell reaction in operable breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Carlomagno</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Perrone</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Lauria</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>de Laurentiis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gallo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Morabito</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pettinato</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Panico</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bellelli</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Apicella</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Oncology</source>
            <pubdate>1995</pubdate>
            <volume>52</volume>
            <fpage>272</fpage>
            <lpage>277</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7777238</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Gene expression profiling shows medullary breast cancer is a subgroup of basal breast cancers.</p>
            </title>
            <aug>
               <au>
                  <snm>Bertucci</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Finetti</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cervera</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Charafe-Jauffret</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mamessier</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Adelaide</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Debono</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Houvenaeghel</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Maraninchi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Viens</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2006</pubdate>
            <volume>66</volume>
            <fpage>4636</fpage>
            <lpage>4644</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1158/0008-5472.CAN-06-0031</pubid>
                  <pubid idtype="pmpid" link="fulltext">16651414</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data.</p>
            </title>
            <aug>
               <au>
                  <snm>Teschendorff</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Barbosa-Morais</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Brenton</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Caldas</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3025</fpage>
            <lpage>3033</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti466</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860564</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <aug>
               <au>
                  <snm>Agresti</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Categorical Data Analysis. Wiley Series in Probability and Statistics</source>
            <publisher>New York: Wiley</publisher>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Integrative genomic approaches identify IKBKE as a breast cancer oncogene.</p>
            </title>
            <aug>
               <au>
                  <snm>Boehm</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Firestein</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>IF</fnm>
               </au>
               <au>
                  <snm>Sjostrom</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Garraway</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Weremowicz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>AL</fnm>
               </au>
               <etal/>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>129</volume>
            <fpage>1065</fpage>
            <lpage>1079</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2007.03.052</pubid>
                  <pubid idtype="pmpid" link="fulltext">17574021</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>BRCA1 regulates IFN-gamma signaling through a mechanism involving the type I IFNs.</p>
            </title>
            <aug>
               <au>
                  <snm>Buckley</snm>
                  <fnm>NE</fnm>
               </au>
               <au>
                  <snm>Hosey</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Gorski</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Purcell</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Mulligan</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Harkin</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Mullan</snm>
                  <fnm>PB</fnm>
               </au>
            </aug>
            <source>Mol Cancer Res</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>261</fpage>
            <lpage>270</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1158/1541-7786.MCR-06-0250</pubid>
                  <pubid idtype="pmpid" link="fulltext">17374731</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>The pattern of clinical breast cancer metastasis correlates with a single nucleotide polymorphism in the C1qA component of complement.</p>
            </title>
            <aug>
               <au>
                  <snm>Racila</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Racila</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Ritchie</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dahle</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Weiner</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Immunogenetics</source>
            <pubdate>2006</pubdate>
            <volume>58</volume>
            <fpage>1</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00251-005-0077-y</pubid>
                  <pubid idtype="pmpid" link="fulltext">16465510</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Role of the integrin-binding protein osteopontin in lymphatic metastasis of breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Allan</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>George</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vantyghem</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Hodgson</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Engel</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Holliday</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Girvan</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Postenka</snm>
                  <fnm>CO</fnm>
               </au>
               <etal/>
            </aug>
            <source>Am J Pathol</source>
            <pubdate>2006</pubdate>
            <volume>169</volume>
            <fpage>233</fpage>
            <lpage>246</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1698777</pubid>
                  <pubid idtype="pmpid" link="fulltext">16816376</pubid>
                  <pubid idtype="doi">10.2353/ajpath.2006.051152</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Association of S100A4 and osteopontin with specific prognostic factors and survival of patients with minimally invasive breast cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>de Silva Rudland</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Roshanlall</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Winstanley</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Leinster</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Platt-Higgins</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Carroll</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>West</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Barraclough</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rudland</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Clin Cancer Res</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <fpage>1192</fpage>
            <lpage>1200</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1158/1078-0432.CCR-05-1580</pubid>
                  <pubid idtype="pmpid" link="fulltext">16489073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms.</p>
            </title>
            <aug>
               <au>
                  <snm>Alter</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>3351</fpage>
            <lpage>3356</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">152296</pubid>
                  <pubid idtype="pmpid" link="fulltext">12631705</pubid>
                  <pubid idtype="doi">10.1073/pnas.0530258100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Diagnosis of multiple cancer types by shrunken centroids of gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Narasimhan</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>6567</fpage>
            <lpage>6572</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">124443</pubid>
                  <pubid idtype="pmpid" link="fulltext">12011421</pubid>
                  <pubid idtype="doi">10.1073/pnas.082099299</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>The Ensembl genome database project.</p>
            </title>
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Down</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>38</fpage>
            <lpage>41</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99161</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752248</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.38</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Kurtosis: a critical review.</p>
            </title>
            <aug>
               <au>
                  <snm>Balanda</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>MacGillivray</snm>
                  <fnm>HL</fnm>
               </au>
            </aug>
            <source>Am Stat</source>
            <pubdate>1988</pubdate>
            <volume>42</volume>
            <fpage>111</fpage>
            <lpage>119</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2684482</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <aug>
               <au>
                  <snm>Snedecor</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Cochran</snm>
                  <fnm>WG</fnm>
               </au>
            </aug>
            <source>Statistical Methods</source>
            <publisher>Ames, IA: Iowa State University Press</publisher>
            <edition>6</edition>
            <pubdate>1967</pubdate>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Estimating the dimension of a model.</p>
            </title>
            <aug>
               <au>
                  <snm>Schwarz</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Annls Stat</source>
            <pubdate>1978</pubdate>
            <volume>6</volume>
            <fpage>461</fpage>
            <lpage>464</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1214/aos/1176344136</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Model-based clustering and data transformations for gene expression data.</p>
            </title>
            <aug>
               <au>
                  <snm>Yeung</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Fraley</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Murua</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Raftery</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Ruzzo</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>977</fpage>
            <lpage>987</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.10.977</pubid>
                  <pubid idtype="pmpid" link="fulltext">11673243</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Inferring parameters and structure of latent variable models by variational bayes.</p>
            </title>
            <aug>
               <au>
                  <snm>Attias</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence; 30-31 July 1999; Stockholm, Sweden</source>
            <publisher>San Francisco, CA: Morgan Kaufmann</publisher>
            <pubdate>1999</pubdate>
            <fpage>21</fpage>
            <lpage>30</lpage>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Developments in probabilistic modelling with neural networks-ensemble learning.</p>
            </title>
            <aug>
               <au>
                  <snm>MacKay</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Neural Networks: Artificial Intelligence and Industrial Applications. Proceedings of the 3rd Annual Symposium on Neural Networks: 14-15 September 1995; Nijmengen, The Netherlands</source>
            <publisher>Berlin: Springer</publisher>
            <pubdate>1995</pubdate>
            <fpage>191</fpage>
            <lpage>198</lpage>
         </bibl>
         <bibl id="B55">
            <aug>
               <au>
                  <cnm>R Development Core Team</cnm>
               </au>
            </aug>
            <source>R: a language and environment for statistical computing</source>
            <publisher>Vienna, Austria: R Foundation for Statistical Computing</publisher>
            <pubdate>2003</pubdate>
            <url>http://www.R-project.org</url>
         </bibl>
      </refgrp>
   </bm>
</art>
