<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>gb-2011-12-8-r80</ui><ji>1465-6906</ji><fm>
<dochead>Method</dochead>
<bibl>
<title>
<p>Single-cell copy number variation detection</p>
</title>
<aug>
<au id="A1"><snm>Cheng</snm><fnm>Jiqiu</fnm><insr iid="I1"/><insr iid="I2"/><email>jiqiu.cheng@esat.kuleuven.be</email></au>
<au id="A2"><snm>Vanneste</snm><fnm>Evelyne</fnm><insr iid="I3"/><email>evelyne.vanneste@med.kuleuven.be</email></au>
<au id="A3"><snm>Konings</snm><fnm>Peter</fnm><insr iid="I1"/><insr iid="I2"/><email>peter.konings@esat.kuleuven.be</email></au>
<au id="A4"><snm>Voet</snm><fnm>Thierry</fnm><insr iid="I3"/><email>thierry.voet@med.kuleuven.be</email></au>
<au id="A5"><snm>Vermeesch</snm><mi>R</mi><fnm>Joris</fnm><insr iid="I3"/><email>joris.vermeesch@uzkuleuven.be</email></au>
<au ca="yes" id="A6"><snm>Moreau</snm><fnm>Yves</fnm><insr iid="I1"/><insr iid="I2"/><email>yves.moreau@esat.kuleuven.be</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Electrical Engineering, Esat-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven 3001, Belgium</p></ins>
<ins id="I2"><p>IBBT-K.U.Leuven Future Health Department, Kasteelpark Arenberg 10, Leuven 3001, Belgium</p></ins>
<ins id="I3"><p>Center for Human Genetics, Katholieke Universiteit Leuven, Herestraat 49, Leuven 3000, Belgium</p></ins>
</insg>
<source>Genome Biology</source>
<issn>1465-6906</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>8</issue>
<fpage>R80</fpage>
<url>http://genomebiology.com/content/12/8/R80</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2011-12-8-r80</pubid><pubid idtype="pmpid">21854607</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>6</day><month>6</month><year>2011</year></date></rec><revrec><date><day>9</day><month>8</month><year>2011</year></date></revrec><acc><date><day>19</day><month>8</month><year>2011</year></date></acc><pub><date><day>19</day><month>8</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Cheng et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<p>Detection of chromosomal aberrations from a single cell by array comparative genomic hybridization (single-cell array CGH), instead of from a population of cells, is an emerging technique. However, such detection is challenging because of the genome artifacts and the DNA amplification process inherent to the single cell approach. Current normalization algorithms result in inaccurate aberration detection for single-cell data. We propose a normalization method based on channel, genome composition and recurrent genome artifact corrections. We demonstrate that the proposed channel clone normalization significantly improves the copy number variation detection in both simulated and real single-cell array CGH data.</p>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Array analysis of single-cell copy number variations (CNVs) is a recently developed experimental technique for the detection of chromosomal rearrangements in single cells <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>. Two-color single-cell array comparative genomic hybridization (CGH) assays the copy number difference between an euploid reference sample from genomic DNA and an unknown test sample from amplified single-cell DNA by comparing signal intensities using log2 ratios <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. However, the accurate detection of single-cell CNV has been hampered by the noise levels in the log2 ratios caused by the amplification of the minute quantities of DNA present in a single cell. Moreover, since the reference DNA in single-cell array experiments is non-amplified genomic DNA extracted from a large number of cells <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>, the biological nature of test and reference sample is different, resulting in new genome artifacts <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. Unfortunately, existing normalization strategies do not provide clear guidelines for checking for these artifacts, nor for handling them appropriately.</p>
<p>Among existing array CGH normalization methods, global loess normalization is commonly used <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. Global loess normalization regresses the log2 ratios between test and reference samples on intensities using all probes <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>. The snapCGH package commonly used for analyzing array CGH data has included the global loess normalization method <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>. Furthermore, poplowess and CGHnormaliter have been developed for array CGH data <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp>. Poplowess attempts to separate normal from aberrant probes using k-means clustering and applies the loess normalization based on the largest group of probes, whereas CGHnormaliter combines a segmentation algorithm with loess normalization iteratively and normalizes data based on segmented normal probes. Although these two methods are supposed to help correctly recognize real chromosomal aberrations, they are not able to correct genome artifacts and could result in false calling of aberrations. Alternatively, the smoothing wave algorithm has been devised to remove genome artifacts that are either related to the GC content or other unknown factors <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>. However, this method requires calibrated genome profiles that are typically not available in the single-cell setup. Recently, more advanced algorithms have been proposed based on the combination of normalization, segmentation, and copy number calling <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp>. These algorithms allow simultaneous normalization and segmentation and are expected to jointly improve the CNV detection performance. However, these advanced algorithms have been developed for genomic array CGH data and not for single-cell array CGH data, which has an additional artifact-causing property compared to genomic data. All of these normalization methods have in common that they normalize data on the ratio of both channels without taking the single-cell amplification bias and genome artifacts into account.</p>
<p>In this paper, we present a new normalization approach based on channel and clone-specific artifact corrections, named channel clone normalization, to remove the amplification bias caused by the different natures of test and reference samples. Moreover, this approach removes genome artifacts that obscure the detection of real aberrations. The explorations of the amplification bias and genome artifacts are shown in the Results section. Furthermore, we compare our newly developed method to several existing normalization methods (global loess, poplowess, and CGHnormaliter) as well as to the methods combining normalization and segmentation (Haarseg, genome alteration detection analysis (GADA), and circular binary segmentation (CBS) combined normalization) <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp>. The significant performance improvement of our channel-specific normalization method is shown for both simulated and real single-cell array CGH data.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Simulation of single-cell data</p>
</st>
<p>To quantify the effect of the channel clone normalization, we simulated 15 samples including 23 artificial aberrations based on 7 real Epstein-Barr virus (EBV)-transformed samples as described in the Application section. The simulation details are presented in the Materials and methods section. This simulation data set is comparable to real genome profile features of the single-cell array CGH data with known artificial aberrations. The overall performance of all normalization methods on the simulation data set is demonstrated in Figure <figr fid="F1">1</figr>. The true positive rates (TPRs) using global loess, CGHnormaliter, poplowess, and channel clone normalization are 0.97, 0.94, 0.92, and 0.96, respectively, whereas the false positive rates (FPR) are 0.06, 0.08, 0.08 and 0, respectively. Although channel clone normalization missed 1 out of the 23 known aberrations, it offers the best performance in comparison to the other normalization methods with the fewest falsely discovered CNV regions and comparable TPR. Global loess, CGHnormaliter, and poplowess show similar CNV detection performance in terms of TPR and FPR.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Barplot of true positive rate and false positive rate of 15 simulated samples</p></caption><text>
   <p><b>Barplot of true positive rate and false positive rate of 15 simulated samples</b>. All the true positive rates (TPRs) and false positive rates (FPRs) were calculated after the global loess, CGHnormaliter, poplowess or channel clone normalization methods.</p>
</text><graphic file="gb-2011-12-8-r80-1"/></fig>
<p>An example shown in Figure <figr fid="F2">2</figr> illustrates the correction of genome artifacts by channel clone normalization. Chromosome 10 of sample 4 contains a confirmed duplication on the q-arm. This duplication was correctly detected by all four normalization methods. However, the chromosome 10 q-terminal region was incorrectly detected as a deletion using global loess, CGHnormaliter, and poplowess. In contrast, this genome artifact was corrected by the channel clone normalization method and detected as a normal region.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Copy number variation detection in chromosome 10 of a simulated sample</p></caption><text>
   <p><b>Copy number variation detection in chromosome 10 of a simulated sample</b>. (a-d) CBS segmentation of chromosome 10 from the simulated sample 4 using global loess normalization (a), CGHnormaliter (b), poplowess (c), and channel clone normalization (d). The blue line represents the CBS segmentation line. The red region and green regions represent the deletion and duplication regions called by CGHcall.</p>
</text><graphic file="gb-2011-12-8-r80-2"/></fig>
</sec>
<sec>
<st>
<p>Application 1: single EBV-transformed lymphoblastoid cell array CGH</p>
</st>
<p>We analyzed seven single EBV-transformed lymphoblastoid cells amplified according to the previously described protocol <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. Each of these amplified single-cell DNA samples was hybridized as a test sample on Agilent 244 K arrays against genomic non-amplified DNA derived from a patient with Klinefelter syndrome (47, XXY). The aberration and diploid regions have been validated by the corresponding genomic DNA using a 250 K Affymetrix SNP array with the help of SNP copy number, loss-of-heterozygosity, and heterozygous SNPs. The karyotype of each EBV-transformed sample is shown in Table <tblr tid="T1">1</tblr>. We used this data set to quantify our approach and benchmark our data with other methods.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>True positive rate of each EBV cell followed by different normalizations</p></caption><tblbdy cols="10">
      <r>
         <c ca="left">
            <p>
               <b>Real aberrations<sup>a</sup></b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Global loess</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CGHnormaliter</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Poplowess</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Haarseg</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CG probeA</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CG</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CA</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CGACBS</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Channel clone</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="10">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 617, 14 M, 18p ter del</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>13.62</p>
         </c>
         <c ca="center">
            <p>13.72</p>
         </c>
         <c ca="center">
            <p>13.56</p>
         </c>
         <c ca="center">
            <p>14.18</p>
         </c>
         <c ca="center">
            <p>14.86</p>
         </c>
         <c ca="center">
            <p>12.05</p>
         </c>
         <c ca="center">
            <p>14.18</p>
         </c>
         <c ca="center">
            <p>11.99</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 1151, 9.3 M, 18p, dup</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>9.04</p>
         </c>
         <c ca="center">
            <p>8.87</p>
         </c>
         <c ca="center">
            <p>6.21</p>
         </c>
         <c ca="center">
            <p>9.06</p>
         </c>
         <c ca="center">
            <p>8.92</p>
         </c>
         <c ca="center">
            <p>8.87</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 1151, 1.7 M, 20p ter del</p>
         </c>
         <c ca="center">
            <p>1.70</p>
         </c>
         <c ca="center">
            <p>1.70</p>
         </c>
         <c ca="center">
            <p>1.70</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>1.70</p>
         </c>
         <c ca="center">
            <p>1.70</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>1.70</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 1151, monosomy X</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
         <c ca="center">
            <p>151.59</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
         <c ca="center">
            <p>151.87</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 1160, 3 M, 11 qter del</p>
         </c>
         <c ca="center">
            <p>2.22</p>
         </c>
         <c ca="center">
            <p>1.73</p>
         </c>
         <c ca="center">
            <p>2.56</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>2.20</p>
         </c>
         <c ca="center">
            <p>2.22</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>2.22</p>
         </c>
         <c ca="center">
            <p>2.56</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 1162, 47.5 M, 14q dup</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>40.14</p>
         </c>
         <c ca="center">
            <p>31.97</p>
         </c>
         <c ca="center">
            <p>45.78</p>
         </c>
         <c ca="center">
            <p>47.39</p>
         </c>
         <c ca="center">
            <p>39.47</p>
         </c>
         <c ca="center">
            <p>47.39</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 1162, 58 M, chr X, del</p>
         </c>
         <c ca="center">
            <p>59.94</p>
         </c>
         <c ca="center">
            <p>59.94</p>
         </c>
         <c ca="center">
            <p>59.94</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>59.93</p>
         </c>
         <c ca="center">
            <p>59.93</p>
         </c>
         <c ca="center">
            <p>57.30</p>
         </c>
         <c ca="center">
            <p>59.92</p>
         </c>
         <c ca="center">
            <p>57.30</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cell 1168, trisomy 21</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>36.99</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TPR</p>
         </c>
         <c ca="center">
            <p>0.66</p>
         </c>
         <c ca="center">
            <p>0.71</p>
         </c>
         <c ca="center">
            <p>0.71</p>
         </c>
         <c ca="center">
            <p>0.66</p>
         </c>
         <c ca="center">
            <p>0.83</p>
         </c>
         <c ca="center">
            <p>0.86</p>
         </c>
         <c ca="center">
            <p>0.86</p>
         </c>
         <c ca="center">
            <p>0.86</p>
         </c>
         <c ca="center">
            <p>0.98</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p><sup>a</sup>Validated by Affymetrix 250 K array using genomic DNA. The first column represents the true validated aberrations of each EBV cell, followed by the detected aberration length after global loess, CGHnormaliter, poplowess, Haarseg, CGprobeA, CG, CA, CGACBS and channel clone normalization methods. The column with bold numbers shows the detected aberration length and true positive rate after the channel clone normalization. CA, channel standardization followed by recurrent genome artifact correction without CBS segmentation; CG, channel standardization followed by genome composition correction using enlarged window GC contents; CGACBS, channel standardization followed by genome composition correction using enlarged window GC contents and recurrent genome artifact correction with CBS segmentation; CGprobeA, channel standardization followed by genome composition correction using probe GC contents and recurrent genome artifact correction without CBS segmentation; Channel clone, channel standardization followed by genome composition correction using enlarged window GC contents and recurrent genome artifact correction without CBS segmentation.</p>
   </tblfn></tbl>
<p>Our normalization approach mainly consists of three steps: channel standardization, genome composition artifacts correction and recurrent genome artifacts correction. All of these three steps are necessary to improve single-cell CNV detection. The investigation of the single-cell amplification bias is covered in the 'Exploration of the amplification bias' section and the exploration of genome artifacts is covered in the 'Detection of copy number variation' section.</p>
<sec>
<st>
<p>Exploration of the amplification bias</p>
</st>
<p>We first explored the amplification bias caused by the different natures of the test and reference samples with the help of graphical plots. MA, density, and quantile-quantile (QQ) plots are used to check for potential artifacts before and after normalization. The <it>y</it>-axis and <it>x</it>-axis of a MA plot represent the log2 ratios and average log2 intensities between two hybridized samples, respectively. The points of a MA plot should be randomly located around zero in the <it>y</it>-axis if no large aberrations or artifacts exist in the data. The density plot and QQ plot are graphical techniques to show the similarity between intensity distributions from test and reference samples. If the test sample intensities are distributed similarly to reference intensities, the density plot of two hybridized samples should overlap and the QQ plot should be located along the 45-degree line.</p>
<p>An obvious intensity-dependent pattern is observed in the MA plot of all single-cell array CGH experiments (Figure <figr fid="F3">3a</figr>; Additional file <supplr sid="S1">1</supplr>). The pattern visualized using the red lowess smoothing line shows that the log2 ratio increases nonlinearly with the increase of the average intensities in the single-cell array CGH data. In contrast, the MA plot of an array CGH experiment using non-amplified genomic DNA shows no aberrant pattern (Figure <figr fid="F3">3b</figr>). Since both array CGH experiments were performed using the same series of Agilent 244K arrays and the only difference between them was the processing of the test samples, we suspect that the intensity-dependent pattern artifact is caused by the amplification of the single-cell DNA. This suspicion is confirmed by the larger standard deviation (SD) of the intensities in the amplified test sample compared to the non-amplified reference sample (Figure <figr fid="F4">4a</figr>). Consequently, the median SD of single-cell array CGH log2 ratios is 1.38, ranging from 0.85 to 1.44 across 7 arrays, whereas that of the genomic array CGH experiments is 0.28, ranging from 0.2 to 0.35 across 6 arrays. This larger SD of log2 ratios in the single-cell array CGH experiments hampers the accurate detection of CNVs at the single-cell level. It is thus necessary to remove this amplification bias.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>MA plot of a single EBV-transformed cell</p></caption><text>
   <p><b>MA plot of a single EBV-transformed cell</b>. (a-c) MA plot for EBV-transformed single lymphoblastoid cell 1162 before normalization (a), genomic DNA before normalization (b), EBV-transformed single lymphoblastoid 1162 after channel standardization (c). The red line represents a lowess curve fitted to the data. Note that after normalizations, most of the log2 ratio values are distributed randomly around zero.</p>
</text><graphic file="gb-2011-12-8-r80-3"/></fig>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Figure S1 - MA plot of single-cell array CGH</b>. MA plot of EBV-transformed cell 1160. The spots in the plot are the clones excluding internal control and incomplete physical annotated clones. The red spots represent clones with intensities more than five-fold lower than the median background intensity.</p>
</text>
<file name="gb-2011-12-8-r80-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Density plot of a single EBV-transformed cell</p></caption><text>
   <p><b>Density plot of a single EBV-transformed cell</b>. <b>(a,b) </b>Density plot for EBV-transformed single lymphoblastoid cell 1162 before normalization (a), and after channel standardization (b). The solid line represents the reference sample and the dashed line represents the test sample. Note that the SD of the intensities of the test sample (SD = 1.02) is larger than that of the reference sample (SD = 0.61). <b>(c,d) </b>QQ plot of the intensities between the test and the reference samples before normalization (c), and after channel standardization (d).</p>
</text><graphic file="gb-2011-12-8-r80-4"/></fig>
<p>After the data are normalized by the channel standardization step, the pattern between averaged intensity and log2 ratio disappears and the lowess curve fitted to the data is close to horizontal (Figure <figr fid="F3">3c</figr>). The intensity distributions of the reference and test samples are adjusted to have approximate mean zero and SD equal to 1 (Figure <figr fid="F4">4b</figr>). The QQ plot in Figure <figr fid="F4">4d</figr> shows that most points after the channel clone normalization are located around the 45-degree reference line, meaning that the intensities of normalized test and reference samples follow similar distributions. We conclude that the amplification bias has been successfully removed by the channel standardization step.</p>
</sec>
<sec>
<st>
<p>Detection of copy number variation</p>
</st>
<p>After the exploration of the amplification bias, we checked the impact of genome composition artifacts and recurrent genome artifacts on the performance of single-cell CNV detection using the CBS algorithm <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. Genome composition artifacts, appearing as incorrect chromosomal aberrations, are frequently observed in the array CGH data. These artifacts are illustrated in Figure S2a,b in Additional file <supplr sid="S2">2</supplr> with the low log2 ratios of the chromosome 1 p terminus and the chromosome 10 q terminus. Studies have shown that these genome composition artifacts could be caused by GC content as well as other unknown factors <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>.</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Figure S2 - genome profile of single-cell array CGH before and after genome composition correction</b>. Genome plots of EBV-transformed cell 1168. <b>(a,b) </b>Genome plots of chromosomes 1 and 10 before genome composition correction. <b>(c,d) </b>Genome plots of chromosomes 1 and 10 after genome composition correction. The red line represents a lowess curve fitted to the data.</p>
</text>
<file name="gb-2011-12-8-r80-S2.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>We therefore use a genome composition correction step to correct the artifacts caused by GC content and a recurrent artifact correction step to correct unknown recurrent artifacts. For the genome composition correction step, we considered two possible methods: correction based on the GC content of (1) the probe sequence itself or (2) an enlarged window around the probe. Similarly, for the recurrent genome artifact correction we also considered two methods: (1) CBS segmented residuals followed by the recurrent genome artifact correction and (2) an artifact correction without the CBS segmentation in advance. The details of the channel clone normalization are introduced in the Materials and methods section. We compare our channel clone approach with four sub-methods to show that the combination of channel standardization, genome composition artifact correction and recurrent genome artifact correction together give the best single-cell CNV detection performance: CG (channel plus genome composition correction using enlarged window GC contents); CA (channel plus recurrent genome artifact correction without CBS segmentation); CGprobeA (channel plus genome composition correction using probe GC contents plus recurrent genome artifact correction without CBS segmentation); CGACBS (channel plus genome composition correction using enlarged window GC contents plus CBS segmented residuals followed by recurrent genome artifact correction); channel clone (channel plus genome composition correction using enlarged window GC contents plus recurrent genome artifact correction without CBS segmentation).</p>
<p>The genome profiles before and after genome composition correction are shown in Additional file <supplr sid="S2">2</supplr>. It is obvious that the GC-content-related artifacts, appearing as a wave pattern in Figure S2a,b in Additional file <supplr sid="S2">2</supplr> are adjusted after the genome composition correction shown in Figure S2c,d in Additional file <supplr sid="S2">2</supplr>. Similarly, Figure <figr fid="F5">5</figr> shows that the CNV detection performance of CG with a TPR 0.86 and FPR 0.06, respectively, is better than for the methods that do not account for genome composition correction (for example, global loess, CGHnormaliter, and poplowess).</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Barplot of true positive rate and false positive rate of 7 EBV-transformed cells</p></caption><text>
   <p><b>Barplot of true positive rate and false positive rate of 7 EBV-transformed cells</b>. All the TPRs and FPRs were calculated after global loess, CGHnormaliter, poplowess, Haarseg, CG, CA, CGprobeA, CGACBS and channel clone normalization approaches.</p>
</text><graphic file="gb-2011-12-8-r80-5"/></fig>
<p>Different studies have used genome composition corrections to correct the genome wave pattern <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>. Array CGH hybridization is influenced not only by the GC content of the probe sequence but also the DNA sequences that lie in an enlarged window around the probe sequence corresponding to a DNA sequence fragment the probe hybridizes to. Diskin <it>et al. </it>
<abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp> used an ordinary linear regression model to regress the Log2Ratio on the GC content of a fixed 1Mb window size around the probe to correct the genome composition artifacts. Since this method was developed for single-channel arrays and cannot be directly implemented for the two-color arrays, we developed a comparable but more elaborate genome composition correction approach. To account for the GC content of the unknown genome fragments, our method extracts the GC percentage from different window sizes around each probe and elects the window size with the highest correlated GC content to the log2 ratio for the genome composition correction. Secondly, in contrast with Diskin <it>et al</it>.'s method, we use a weighted linear regression model with larger weights for the GC-rich probes to avoid the overcorrection of real chromosomal aberrations. Other genome correction methods could also be valid. However, comparison of all GC correction methods is outside the scope of our study. To show that accounting for the GC content from enlarged window sizes improves the genome composition correction, we also performed the correction based only on the GC content of each probe, as proposed by the CGprobeA normalization. Figure <figr fid="F5">5</figr> shows that the TPR and FPR values are 0.86 and 0.015, respectively, for the CGprobeA normalization method, whereas the values for our channel normalization are 0.98 and 0.006, respectively. This comparison confirms the importance of finding the optimal GC-content window for the genome composition correction.</p>
<p>The impact of the recurrent genome artifact correction of each chromosome is especially explained in Additional file <supplr sid="S3">3</supplr> and shown in Additional files <supplr sid="S4">4</supplr> to <supplr sid="S10">10</supplr>. For instance, chromosome 3 of EBV-transformed cell 1168 was experimentally confirmed to have no aberrations. However, two deletions at the location around 50 Mb and the q-arm terminal region were observed when no correction was applied (Figure S3a in Additional file <supplr sid="S3">3</supplr>). The estimated common profile of chromosome 3 (Figure S3b in Additional file <supplr sid="S3">3</supplr>) shows the artifacts at the same locations as in the individual profile of EBV-transformed cell 1168. Since the common profile is estimated across all the EBV-transformed samples, the artifacts observed in the common profile represent the recurrent genome artifacts existing in multiple EBV-transformed samples. Figure S3c in Additional file <supplr sid="S3">3</supplr> shows that after the extraction of the estimated common profile, these two artifacts have been removed and the segmentation line of this chromosomal profile is horizontal around the zero line.</p>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>Figure S3 - genome profile of single-cell array CGH before and after recurrent genome artifacts correction</b>. <b>(a) </b>Genome plot of chromosome 3 from EBV-transformed cell 1168 before recurrent genome artifact correction. The red line represents the CBS segmentation. <b>(b) </b>Estimated common profile trend of chromosome 3 across all the EBV-transformed cells. The red line represents a lowess curve. <b>(c) </b>Genome plot of chromosome 3 from EBV-transformed cell 1168 after recurrent genome artifact correction. The red line represents the CBS segmentation.</p>
</text>
<file name="gb-2011-12-8-r80-S3.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>Figure S4 - genome-wide copy number variation detection of single EBV-transformed cell 1168 using existing normalization methods</b>. <b>(a-d) </b>Single-cell CNV detection of EBV-transformed cell 1168 after global loess (a), CGHnormaliter (b), poplowess (c) and Haarseg normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S4.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S5">
<title>
<p>Additional file 5</p>
</title>
<text>
<p>
<b>Figure S5 - genome-wide copy number variation detection of single EBV-transformed cell 1151 using existing normalization methods</b>. <b>(a-d) </b>Single-cell CNV detection of EBV-transformed cell 1151 after global loess (a), CGHnormaliter (b), poplowess (c) and Haarseg normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S5.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S6">
<title>
<p>Additional file 6</p>
</title>
<text>
<p>
<b>Figure S6 - genome-wide copy number variation detection of single EBV-transformed cell 1160 using existing normalization methods</b>. <b>(a-d) </b>Single-cell CNV detection of EBV-transformed cell 1160 after global loess (a), CGHnormaliter (b), poplowess (c) and Haarseg normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S6.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S7">
<title>
<p>Additional file 7</p>
</title>
<text>
<p>
<b>Figure S7 - genome-wide copy number variation detection of single EBV-transformed cell 1162 using existing normalization methods</b>. <b>(a-d) </b>Single-cell CNV detection of EBV-transformed cell 1162 after global loess (a), CGHnormaliter (b), poplowess (c) and Haarseg normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S7.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S8">
<title>
<p>Additional file 8</p>
</title>
<text>
<p>
<b>Figure S8 - genome-wide copy number variation detection of single EBV-transformed cell 614 using existing normalization methods</b>. <b>(a-d) </b>Single-cell CNV detection of EBV-transformed cell 614 after global loess (a), CGHnormaliter (b), poplowess (c) and Haarseg normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S8.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S9">
<title>
<p>Additional file 9</p>
</title>
<text>
<p>
<b>Figure S9 - genome-wide copy number variation detection of single EBV-transformed cell 617 using existing normalization methods</b>. <b>(a-d) </b>Single-cell CNV detection of EBV-transformed cell 617 after global loess (a), CGHnormaliter (b), poplowess (c) and Haarseg normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S9.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S10">
<title>
<p>Additional file 10</p>
</title>
<text>
<p>
<b>Figure S10 - genome-wide copy number variation detection of single EBV-transformed cell 1013 using existing normalization methods</b>. <b>(a-d) </b>Single-cell CNV detection of EBV-transformed cell 1013 after global loess (a), CGHnormaliter (b), poplowess (c) and Haarseg normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S10.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>Comparison of the CG and CA methods to channel clone normalization is shown in Figure <figr fid="F5">5</figr> andTable <tblr tid="T1">1</tblr>. Both the CG and CA normalization methods show lower TPRs and larger FPRs for single-cell CNV detection performance. These results confirm our hypothesis that not all genome artifacts can be explained by GC content. Our channel clone normalization method removes genome composition artifacts, as well as unknown recurrent genome artifacts. Therefore, the combination of channel standardization, genome composition and recurrent genome artifact corrections, which we propose, gives the best single-cell CNV detection performance, with a TPR of 0.98 and a FPR of 0.006.</p>
<p>A recent study suggests that the combination of segmentation with recurrent genome artifact correction can improve aberration detection in genomic array CGH applications <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. We tested this CGACBS approach on our single-cell array CGH data. Table <tblr tid="T1">1</tblr> shows that the TPR and FPR of CGACBS are 0.86 and 0.02, respectively, which is outperformed by channel clone normalization, with values of 0.98 and 0.006, respectively. CGACBS uses CBS segmented residuals for genome artifact correction to avoid overcorrection of real chromosomal aberrations. However, this method also protects genome artifacts with log2 ratios comparable to real aberrations from being corrected. Consequently, it results in higher false positive calling of aberrations. Therefore, it is a trade-off between keeping real aberration signals and removing undesired genome artifacts.</p>
<p>Moreover, we have compared our normalization approach to global loess, CGHnormaliter, poplowess, Haarseg, and GADA methods. Using the TPR and FPR as given in Figure <figr fid="F5">5</figr> andTable <tblr tid="T1">1</tblr> we compared the overall CNV detection performance for global loess, CGHnormaliter, poplowess, Haarseg, and channel clone normalization. The TPR values were 0.66, 0.71, 0.71, 0.66, and 0.98, respectively, while the FPRs were 0.13, 0.09, 0.15, 0.05, and 0.006, respectively. Although the recently developed poplowess and CGHnormaliter normalization methods perform better than the original global loess normalization, they have a high FPR as well. The common feature of both methods is the separation of probes with normal log2 ratios from probes with aberrant log2 ratios, as well as the normalization of the data based on the normal probe log2 ratios; however, this is not suitable in single-cell array CGH. The reason is that many genome artifacts appear next to real aberrations caused by amplification bias in the single-cell approach. As a consequence, these genome artifacts are incorrectly segmented or clustered by the CGHnormaliter or poplowess algorithms into aberrant groups, yielding poor results.</p>
<p>The channel clone normalization method has shown its advantage in correcting recurrent genome artifacts across samples. Notice that CBS fails to detect a 2.22 Mb deletion at the chromosome 20 p terminus of cell 1151 after channel clone normalization (Figure S12 in Additional file <supplr sid="S12">12</supplr>). The possible reason is that this deletion is located in the terminal region of a chromosome with a short length of 2.22 Mb. This aberration thus shows a pattern similar to the artifacts located at the same position and results in an overcorrection by the channel clone normalization. However, considering the large FPR caused by chromosomal artifacts from the single-cell array CGH, it is worthwhile to reduce the FPR from around 10% to 0.6%, even while missing one short aberration.</p>
<p>The performance of global loess, CGHnormaliter, poplowess, Haarseg and channel clone normalization on each genome profile is shown in Figures <figr fid="F6">6</figr> and <figr fid="F7">7</figr> and Additional files <supplr sid="S4">4</supplr> to <supplr sid="S17">17</supplr>. For instance, cell 1151 carries a known terminal 9.3 Mb duplication at the chromosome 18 p terminus (Figure <figr fid="F6">6</figr>). This duplication is called after channel clone normalization, but not after the other loess-based methods. Figure <figr fid="F7">7</figr> illustrates that chromosome 21 of cell 1160 is expected to have no aberration. This is confirmed by SNP-array analysis that revealed no loss-of-heterozygosity for this 21q-ter segment. However, the q-terminal region of this chromosome is detected as a deletion after global loess, CGHnormaliter and poplowess normalizations, thus resulting in a false-positive CNV region.</p>
<suppl id="S11">
<title>
<p>Additional file 11</p>
</title>
<text>
<p>
<b>Figures S11 - genome-wide copy number variation detection of single EBV-transformed cell 1168 using the channel clone normalization method</b>. Single-cell CNV detection of EBV-transformed cell 1168 after channel clone normalization. The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S11.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S12">
<title>
<p>Additional file 12</p>
</title>
<text>
<p>
<b>Figures S12 - genome-wide copy number variation detection of single EBV-transformed cell 1151 using the channel clone normalization method</b>. Single-cell CNV detection of EBV-transformed cell 1151 after channel clone normalization. The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S12.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S13">
<title>
<p>Additional file 13</p>
</title>
<text>
<p>
<b>Figures S13 - genome-wide copy number variation detection of single EBV-transformed cell 1160 using the channel clone normalization method</b>. Single-cell CNV detection of EBV-transformed cell 1160 after channel clone normalization. The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S13.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S14">
<title>
<p>Additional file 14</p>
</title>
<text>
<p>
<b>Figures S14 - genome-wide copy number variation detection of single EBV-transformed cell 1162 using the channel clone normalization method</b>. Single-cell CNV detection of EBV-transformed cell 1162 after channel clone normalization. The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S14.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S15">
<title>
<p>Additional file 15</p>
</title>
<text>
<p>
<b>Figures S15 - genome-wide copy number variation detection of single EBV-transformed cell 614 using the channel clone normalization method</b>. Single-cell CNV detection of EBV-transformed cell 614 after channel clone normalization. The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S15.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S16">
<title>
<p>Additional file 16</p>
</title>
<text>
<p>
<b>Figures S16 - genome-wide copy number variation detection of single EBV-transformed cell 617 using the channel clone normalization method</b>. Single-cell CNV detection of EBV-transformed cell 617 after channel clone normalization. The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S16.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S17">
<title>
<p>Additional file 17</p>
</title>
<text>
<p>
<b>Figures S17 - genome-wide copy number variation detection of single EBV-transformed cell 1013 using the channel clone normalization method</b>. Single-cell CNV detection of EBV-transformed cell 1013 after channel clone normalization. The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis the probe position along the chromosome. The blue line represents the CBS segmentation line. The red region represents the deletion and the green region represents the duplication called by CGHcall.</p>
</text>
<file name="gb-2011-12-8-r80-S17.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Copy number variation detection in chromosome 18 of an EBV-transformed sample</p></caption><text>
   <p><b>Copy number variation detection in chromosome 18 of an EBV-transformed sample</b>. <b>(a-d) </b>CBS segmentation of chromosome 18 from the EBV-transformed single lymphoblastoid cell 1151 using global loess normalization (a), CGHnormaliter (b), poplowess (c), and channel clone normalization (d). The <it>y</it>-axis represents the log2 ratios and the <it>x</it>-axis represents the coordinates along the chromosome. The blue line represents the CBS segmentation line. The green region represents the duplication region called by the CGHcall program.</p>
</text><graphic file="gb-2011-12-8-r80-6"/></fig>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>Copy number variation detection in chromosome 21 of an EBV-transformed sample</p></caption><text>
   <p><b>Copy number variation detection in chromosome 21 of an EBV-transformed sample</b>. <b>(a-d) </b>CBS segmentation of chromosome 21 from the EBV-transformed single lymphoblastoid cell 1160 using global loess normalization (a), CGHnormaliter (b), poplowess (c), and channel clone normalization (d). The blue line represents the CBS segmentation line. The red region represents the deletion region called by CGHcall.</p>
</text><graphic file="gb-2011-12-8-r80-7"/></fig>
<p>Haarseg is an algorithm integrating signal smoothing, normalization, segmentation, and copy number calling <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp>. However, this algorithm performs somewhat conservatively in calling chromosomal aberrations in the single-cell array CGH data, even though it gives a lower FPR than loess-based normalization methods. We also checked the performance of GADA in the single-cell application. GADA is an iterative procedure combining normalization and segmentation by sparse Bayesian learning. Around 800 breakpoints were detected in each EBV-transformed sample by GADA (Additional file <supplr sid="S18">18</supplr>). This is biologically unrealistic, and we conclude that many false positive aberrations have been detected. Although Haarseg and GADA are suitable in genomic array CGH data <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B15">15</abbr>
</abbrgrp>, the implementation of these methods becomes inappropriate for single-cell array CGH data. The channel clone method outperforms these methods, having the largest TPR (0.98) and smallest FPR (0.006). Clearly, channel clone normalization improves the TPR considerably compared to these other normalization algorithms or normalization integrated algorithms for single-cell array CGH.</p>
<suppl id="S18">
<title>
<p>Additional file 18</p>
</title>
<text>
<p>
<b>Figure S18 - genome-wide copy number variation detection of single EBV-transformed cells using the GADA algorithm</b>. Single-cell CNV detection of all seven EBV-transformed cells. Each row represents the profile of one EBV-transformed cell and each column represents one probe across all the EBV-transformed samples. Different colors in the profile represent the breakpoints of single-cell CNVs detected by GADA.</p>
</text>
<file name="gb-2011-12-8-r80-S18.JPEG">
   <p>Click here for file</p>
</file>
</suppl>
<p>Recently, a unified model has been developed by the simultaneous integration of normalization, segmentation and copy number calling <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. This model has been shown to be efficient for genomic array CGH data. The advantage of this model is that it can incorporate existing preprocessing methods into one model. It would be attractive to enrich this model by accounting for single-cell data properties for single-cell CNV detection in the near future.</p>
</sec>
</sec>
<sec>
<st>
<p>Application 2: human embryo array CGH</p>
</st>
<p>In reality, the assumption that only few probes display an aneuploidy copy number and most probes display diploid copy numbers does not hold generally (for example, consider heavily rearranged blastomeres, tumor cells, and so on). It is important, therefore, to test whether channel clone normalization would overcorrect the signals of heavily aberrant samples. We applied the channel clone normalization approach to array CGH of14 blastomeres from previously published work <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. All the blastomeres extracted from human embryo 20 carry multiple aberrations. The confirmed karyotype of each blastomere has been described in the previously published paper.</p>
<p>The results show that many artifacts are observed in the genome profile before channel clone normalization (Figure <figr fid="F8">8a,c,e</figr>). These artifacts were removed after channel clone normalization and none of the real chromosomal aberrations were over-corrected (Figure <figr fid="F8">8b,d,f</figr>). For instance, blastomere A carries aberrations in chromosomes 1, 10, 11, 13, 18, 22, and X, blastomere E carries aberrations in chromosomes 1, 2, 4, 7, 10, 11, and 22, and blastomere G carries aberrations in chromosomes 1, 4, 10, 22 and 23. Figure <figr fid="F8">8b,d,e</figr> shows that all of these aberrations were detected after the channel clone normalization. Thus, channel clone normalization appears valid for heavily aberrant samples as well.</p>
<fig id="F8"><title><p>Figure 8</p></title><caption><p>Copy number variation detection in three blastomere samples</p></caption><text>
   <p><b>Copy number variation detection in three blastomere samples</b>. <b>(a,c,e) </b>Genome-wide CNV detection of blastomere A (a), blastomere E (c) and blastomere G (e) from embryo 20 before channel clone normalization. <b>(b,d,f) </b>Genome-wide CNV detection of blastomere A (b), blastomere E (d) and blastomere G (f) from embryo 20 after channel clone normalization. The <it>x</it>-axis represents the coordinate range from chromosome 1 to &#215; and the <it>y</it>-axis represents the log2 ratios. The blue line represents the CBS segmentation line. The green regions represent the duplication region and red regions represent the deletion region called by the CGHcall program.</p>
</text><graphic file="gb-2011-12-8-r80-8"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>The analysis of CNV in single cells using high-density arrays is a novel attractive research technique <abbrgrp>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>. It enables genome-wide analysis of blastomeres during early embryogenesis, cell development, and cancer progression <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. Because the amount of DNA that can be derived from single cells is limited, amplification is necessary. However, amplifying only the test sample results in an amplification bias as well as serious genome artifacts with respect to the log-intensity ratios and leads to poor CNV detection in single-cell array CGH data. So far, no standard procedures have been established to correct this amplification bias and genome artifacts for single cell array CGH. We present a channel clone normalization method that addresses this issue.</p>
<p>The main need for a specific normalization method for single-cell array CGH, as opposed to standard genomic array CGH, arises from the fact that the amplification step in the protocol for single-cell array CGH introduces a key difference compared to array CGH using DNA extracted from a large number of cells. Indeed, only the test sample undergoes DNA amplification while the reference sample remains a DNA sample extracted from a large number of cells with the normal wild-type karyotype. This introduces a major bias in the distribution of signals between the test (amplified single-cell DNA) and reference (non-amplified DNA) samples and genome artifacts, which our method aims to correct. Amplification of the reference sample from a single wild-type cell would be difficult because using amplified single-cell reference samples is unlikely to cancel out the biases caused by amplification since the amplification bias appears to be variable between samples in practice.</p>
<p>Our normalization approach is based on standardization of the distributions of the intensities of test and reference samples, genome composition artifact correction and recurrent genome artifact correction across all the samples. We have shown that our channel clone normalization method clearly improves the performance of single-cell CNV detection compared to other normalization methods, as well as the combined normalization segmentation methods, without losing the ability to detect real aberrations.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>We have proposed a normalization strategy to handle interchannel variation and genome artifacts in two-color arrays and evaluated its applicability using simulated data and data from real single-cell array CGH experiments. Our method was designed originally for single-cell array CGH experiments, but it can be extended to other two-color array experiments that suffer from interchannel variation or genome artifacts. Our approach has the following advantages: first, it achieves good performance for the detection of genomic signals; second, it does not require complex experimental designs, which make the experiments less expensive; and finally, it can be easily implemented without requiring long computing times.</p>
</sec>
<sec>
<st>
<p>Materials and methods</p>
</st>
<sec>
<st>
<p>Channel clone normalization</p>
</st>
<p>The pre-processing consists of four steps. Step 1, filter clones: the internal control, incorrectly annotated and low foreground-intensity clones are filtered out. Step 2, channel standardization: the log2-transformed intensity of test sample and reference sample are standardized based on the trimmed mean and standardized deviation. Step 3, genome composition artifact correction: log2 ratios are subjected to weighted linear regression on the highest correlated GC content, with larger weights for the GC-rich clones. Step 4, recurrent genome artifact correction: a profile is generated using the trimmed mean of log2 ratios for each probe across all the samples. Subsequently, the common profile trend is estimated by applying a spline model to the generated profile. Finally, the estimated common profile trend is subtracted from each individual genome profile.</p>
<p>The channel clone normalization approach was implemented in R 2.12.1 <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp> and the code is available in Additional file <supplr sid="S19">19</supplr>. The last three steps (channel standardization, genome composition correction and recurrent genome artifact correction) are the core steps of our approach. The impact of each normalization step is discussed in the Results section and the details of each step are explained below.</p>
<suppl id="S19">
<title>
<p>Additional file 19</p>
</title>
<text>
<p>
<b>R code to implement channel clone normalization approach</b>.</p>
</text>
<file name="gb-2011-12-8-r80-S19.R">
   <p>Click here for file</p>
</file>
</suppl>
<sec>
<st>
<p>Filtering of clones</p>
</st>
<p>First, internal control and clones with incomplete physical annotations are removed. Second, the median background intensities of each array across all the spots are calculated. Subsequently, clones with intensities more than five-fold smaller than the median background intensities as a threshold are filtered out <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. The threshold is chosen with the help of the MA plot of raw intensities excluding internal control and incomplete physical annotated clones. For instance, Additional file <supplr sid="S1">1</supplr> shows the MA plot of the raw intensity of EBV-transformed cell 1160, with the red spots corresponding to clones with intensities more than five-fold smaller than the median background intensity of this array. These low intensity clones show higher variability than the other clones <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp> and are thus excluded.</p>
</sec>
<sec>
<st>
<p>Channel standardization</p>
</st>
<p>The log2-transformed intensity of the test sample and reference sample are standardized based on the trimmed mean and standard deviation:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-8-r80-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>T</m:mi>
   <m:mi>e</m:mi>
   <m:mi>s</m:mi>
   <m:msub>
      <m:mrow>
         <m:mi>t</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mi>j</m:mi>
         <m:mstyle class="text">
            <m:mtext>_</m:mtext>
         </m:mstyle>
         <m:mi>s</m:mi>
         <m:mo class="qopname"> tan</m:mo>
         <m:mi>d</m:mi>
         <m:mi>a</m:mi>
         <m:mi>r</m:mi>
         <m:mi>d</m:mi>
         <m:mi>i</m:mi>
         <m:mi>z</m:mi>
         <m:mi>e</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mi>e</m:mi>
         <m:mi>s</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mi>t</m:mi>
         <m:mi>r</m:mi>
         <m:mi>i</m:mi>
         <m:mi>m</m:mi>
         <m:mi>m</m:mi>
         <m:mi>e</m:mi>
         <m:mi>d</m:mi>
         <m:mi>m</m:mi>
         <m:mi>e</m:mi>
         <m:mi>a</m:mi>
         <m:mi>n</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>T</m:mi>
               <m:mi>e</m:mi>
               <m:mi>s</m:mi>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>j</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mrow>
         <m:mi>s</m:mi>
         <m:mi>d</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>T</m:mi>
               <m:mi>e</m:mi>
               <m:mi>s</m:mi>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>j</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>
<display-formula>
<m:math name="gb-2011-12-8-r80-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">Re</m:mtext>
   </m:mstyle>
   <m:mspace width="1em" class="nbsp"/>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mi>j</m:mi>
         <m:mstyle class="text">
            <m:mtext>_</m:mtext>
         </m:mstyle>
         <m:mi>s</m:mi>
         <m:mo class="qopname"> tan</m:mo>
         <m:mi>d</m:mi>
         <m:mi>a</m:mi>
         <m:mi>r</m:mi>
         <m:mi>d</m:mi>
         <m:mi>i</m:mi>
         <m:mi>z</m:mi>
         <m:mi>e</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">Re</m:mtext>
         </m:mstyle>
         <m:mspace width="1em" class="nbsp"/>
         <m:msub>
            <m:mrow>
               <m:mi>f</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mi>t</m:mi>
         <m:mi>r</m:mi>
         <m:mi>i</m:mi>
         <m:mi>m</m:mi>
         <m:mi>m</m:mi>
         <m:mi>e</m:mi>
         <m:mi>d</m:mi>
         <m:mi>m</m:mi>
         <m:mi>e</m:mi>
         <m:mi>a</m:mi>
         <m:mi>n</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">Re</m:mtext>
               </m:mstyle>
               <m:mspace width="1em" class="nbsp"/>
               <m:msub>
                  <m:mrow>
                     <m:mi>f</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>j</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mrow>
         <m:mi>s</m:mi>
         <m:mi>d</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">Re</m:mtext>
               </m:mstyle>
               <m:mspace width="1em" class="nbsp"/>
               <m:msub>
                  <m:mrow>
                     <m:mi>f</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>j</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>where <it>Test<sub>ij </sub>
</it>represents the log2-transformed intensities of the <it>i</it>-th probe of the <it>j</it>-th array derived from a test sample; <it>trimmedmean</it>(<it>Test<sub>j</sub>
</it>) represents the trimmed mean of the log2-transformed intensities of the <it>j</it>-th array derived from the test sample; <it>sd</it>(<it>Test<sub>j</sub>
</it>) represents the standard deviation of the log2-transformed intensities of the <it>j</it>-th array of the test sample; and <it>Test</it>
<sub>
<it>ij</it>_<it>s</it>tan<it>dardize </it>
</sub>represents the standardized intensities of the test sample. The parameters to calculate the standardized intensities for the reference sample <it>Re</it>f<sub>
<it>ij</it>_<it>s</it>tan<it>dardize </it>
</sub>are defined in a similar way as for the test sample <it>Test</it>
<sub>
<it>ij</it>_<it>s</it>tan<it>dardize </it>
</sub>.</p>
<p>In this step, the amplification bias is expected to be removed by adjusting most of the intensities of the reference and test samples to follow similar distributions without reducing the correlation between them. To make the normalization robust to outliers, the trimmed mean instead of the global mean is calculated. The difference between the mean and trimmed mean is that the mean is calculated using all of the observations whereas the trimmed mean is based on observations excluding a percentage threshold of extreme observations. The trimmed mean is thus less influenced by extreme values than the mean and more robust to outliers. The percentage threshold is determined from a QQ plot between the intensities of the test and reference samples. For instance, the QQ plot in Figure <figr fid="F4">4</figr> shows that, in our case, approximately 20% of the points have extreme values, located at the two ends of the plot.</p>
</sec>
<sec>
<st>
<p>Genome composition artifact correction</p>
</st>
<p>This step aims to correct for genome composition-related artifacts by the weighted linear regression of log2 ratios on the GC content of an enlarged window around probes. The model is stated as follows:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-8-r80-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>Y</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>x</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msub>
   <m:msubsup>
      <m:mrow>
         <m:mi>x</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#949;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>To estimate the parameter <it>&#946;</it>, the following expression needs to be minimized:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-8-r80-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>L</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>w</m:mi>
      </m:mrow>
   </m:msub>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
   </m:mfenced>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munderover accentunder="false" accent="false">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>n</m:mi>
      </m:mrow>
   </m:munderover>
   <m:msub>
      <m:mrow>
         <m:mi>w</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>Y</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#946;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#946;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mrow>
               <m:mi>x</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#946;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
         <m:msubsup>
            <m:mrow>
               <m:mi>x</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msubsup>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>where <it>w<sub>i </sub>
</it>is 1,000 if <it>x<sub>i </sub>
</it>&gt; 0.5 and 0.01 in other cases; <it>Y<sub>i </sub>
</it>represents the log2 ratio of probe <it>i </it>obtained from the 'channel standardization' step; <it>x<sub>i </sub>
</it>represents the GC content of a certain window size around each probe.</p>
<p>GC contents of different window sizes around probes ranging from 0 to 1 Mb are extracted from the human genome sequence. Next, the correlation between GC content and window size and log2 ratio is calculated. The window size with the highest correlation is selected to fit the model. Thirdly, a large weight (1,000) was assigned to the clones with large GC content whereas a small weight (0.01) was assigned to the clones with low GC content. The residual <it>&#949; </it>of the model is the log2 ratio after the genome composition correction.</p>
</sec>
<sec>
<st>
<p>Recurrent genome artifact correction</p>
</st>
<p>This step corrects recurrent genome artifacts. The recurrent genome artifacts are expected to be represented by the estimated common profile across all the samples. Therefore, a common profile is generated by calculating a trimmed mean of log2 ratios for each probe across all the samples. The common profile trend is estimated using a spline smoothing function <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-8-r80-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>S</m:mi>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mi>g</m:mi>
      </m:mrow>
   </m:mfenced>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munderover accentunder="false" accent="false">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>n</m:mi>
      </m:mrow>
   </m:munderover>
   <m:msup>
      <m:mrow>
         <m:mfenced separators="" open="{" close="}">
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>Y</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>g</m:mi>
               <m:mfenced separators="" open="(" close=")">
                  <m:mrow>
                     <m:msub>
                        <m:mrow>
                           <m:mi>t</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
         </m:mfenced>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msup>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:mi>&#945;</m:mi>
   <m:mo>&#8747; </m:mo>
   <m:msup>
      <m:mrow>
         <m:mfenced separators="" open="{" close="}">
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mi>g</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mo>&#8243;</m:mo>
                  </m:mrow>
               </m:msup>
               <m:mfenced separators="" open="(" close=")">
                  <m:mrow>
                     <m:mi>x</m:mi>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
         </m:mfenced>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msup>
   <m:mi>d</m:mi>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mi>x</m:mi>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>where <it>i </it>represents the <it>i</it>-th probe of an array; <it>t<sub>i </sub>
</it>represents the genome physical position of probe <it>i</it>;<it>Y<sub>i </sub>
</it>represents the log2 ratio of probe <it>i </it>after genome composition correction; <it>n </it>represents the number of knots; <it>g </it>represents a twice-differentiable function; <it>g</it>(<it>t<sub>i</sub>
</it>) represents the estimated smoothing value of the log2 ratio from probe <it>i</it>; <it>&#945; </it>represents a smoothing parameter balancing the model fitting and the model complexity. The maximum of knots that are equal to the total number of probes were used to fit the model.</p>
<p>The smoothing spline estimation of <it>g</it>(<it>t<sub>i</sub>
</it>) is the minimizer of <it>s</it>(<it>g</it>). It represents the common profile trend across all the samples including recurrent genome artifacts. Thus, the subtraction of the estimated common profile trend from each individual genome profile can remove the recurrent genome artifacts.</p>
</sec>
</sec>
<sec>
<st>
<p>Simulation</p>
</st>
<p>A simulation data set was generated based on seven real EBV-transformed samples (described below). Firstly, the EBV data set was processed by setting the true aberrant intensities as empty values. Subsequently, 15 sample intensities were simulated by replacing individual probe intensities from corresponding processed EBV probe intensities. These two steps ensure that all the simulated intensities are non-aberrant. In addition, the simulated genome profiles represent the real single-cell genome profile features, including recurrent genome artifacts. Thirdly, 23 aberrations were artificially added to the simulated data, with the mean intensities of the simulated aberrations setting as the ones of the true aberrant regions from the real EBV-transformed samples. The length of the aberrations was set to around 20 Mb.</p>
</sec>
<sec>
<st>
<p>Single EBV-transformed lymphoblastoid cell array CGH</p>
</st>
<p>Seven EBV-transformed cells derived from patients carrying known unbalanced chromosomal rearrangements were isolated, lysed and amplified following a multiple displacement amplification approach using Genomi Phi V2 <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. Amplified single cell and non-amplified genomic DNA (500 ng) derived from a patient with Klinefelter syndrome was labeled for 2 hours by random primer labeling using Cy5 and Cy3 dCTPS and hybridized according to the manufacturer's instructions to the genome-wide Agilent 244 K array. Slides were scanned by Feature Extraction software using Agilent protocol CGH-v4_10_Apr08. As a validation, genomic DNA isolated from multiple cells of the corresponding EBV-transformed lines was karyotyped as well as analyzed on a 250 K Affymetrix SNP array to confirm real aberrant regions. A deleted region on a SNP array presents only a single allele and is indicated by loss-of-heterozygosity. Diploid regions were confirmed by heterozygous SNPs <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. The karyotype of each EBV-transformed sample is listed in Table <tblr tid="T1">1</tblr>.</p>
</sec>
<sec>
<st>
<p>Human embryo array CGH</p>
</st>
<p>Fourteen blastomeres derived from human embryo 6, 8, 15, 16, 19 and 20 carrying known chromosomal rearrangements were hybridized to the Agilent 244 K array. The experimental protocol and validation were similar to the single EBV-transformed cell array CGH and are explained in detail in previously published work <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. Most of these blastomeres carry multiple aberrations within one cell. The complete karyotype of each blastomere is reported in <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Gene Expression Omnibus accession numbers</p>
</st>
<p>All the single-cell data from this study are public accessible in the Gene Expression Omnibus under SuperSeries [GSE31219]. GSE31219 contains single EBV-transformed lymphoblastoid cell array and human blastomere array data. The previously published human blastomere data are accessible through Gene Expression Omnibus series accession number [GSE11663].</p>
</sec>
<sec>
<st>
<p>Evaluation of CNV calling in single-cell array CGH experiments</p>
</st>
<p>The parameters of the CBS algorithm were optimized to detect validated known CNVs of EBV-transformed single cells. The CGHcall program was used to call the CNVs in single cells. It fits each CBS segment to a mixture model with four states and calls each segment as a duplication, deletion, amplification or normal state <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>. We calculated the TPR and FPR to evaluate the CNV detection. TPR was defined as the length of CGHcall CNVs within the true aberrant regions divided by the total length of true aberrant regions. FPR was defined as the length of CGHcall CNVs outside the aberrant regions divided by the total non-aberrant region lengths <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. The CBS algorithm was implemented using the R package snapCGH <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>.</p>
</sec>
</sec>
<sec>
<st>
<p>Abbreviations</p>
</st>
<p>CBS: circular binary segmentation; CGH: comparative genomic hybridization; CNV: copy number variation; EBV: Epstein-Barr virus; FPR: false positive rate; GADA: genome alteration detection analysis; QQ: quantile-quantile; SD: standard deviation; SNP: single nucleotide polymorphism; TPR: true positive rate.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>JRV designed the novel single-cell experiment and critically reviewed the manuscript. EV and TV designed the novel single-cell experiment, carried out the single-cell experiments and generated the array data and critically reviewed the manuscript. PK was involved in the statistical discussion and critically reviewed the manuscript. YM conceived of the study, guided JC to develop the algorithm and critically reviewed the manuscript. JC performed the analysis and wrote the manuscript. All authors have read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>We are grateful to the editor and two reviewers for reviewing the manuscript and providing precious comments. This work was supported by the Research Council K.U.Leuven: ProMeta, GOA Ambiorics, GOA MaNet, CoE EF/05/007 SymBioSys, START 1, several PhD/postdoc and fellowship grants; the Flemish government: FWO-PhD/postdoc grants, FWO-projects, G.0318.05 (subfunctionalization), G.0553.06 (VitamineD), G.0302.07 (SVM/Kernel), FWO- research communities (ICCoS, ANMMM, MLDM), G.0733.09 (3UTR), G.082409 (EGFR), IWT- PhD Grants, Silicos, SBO-BioFrame, SBO-MoKa, SBO-60848, TBM-IOTA3, FOD-Cancer plans; Belgian Federal Science Policy Office: IUAP P6/25 (BioMaGNet, Bioinformatics and Modeling: from Genomes to Networks, 2007-2011); EU-RTD: ERNSI: European Research Network on System Identification; FP7-HEALTH CHeartED. We would like to thank Agilent for providing us with the Agilent 244 K arrays. We would also like to thank Sigrun Jackmaert for preparing and performing the array experiments. We appreciate theoretical discussions with Leon Charles Tranchevent and Kristof Engelen. EV was supported by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen).</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Single-cell chromosomal imbalances detection by array CGH.</p></title><aug><au><snm>Le Caignec</snm><fnm>C</fnm></au><au><snm>Spits</snm><fnm>C</fnm></au><au><snm>Sermon</snm><fnm>K</fnm></au><au><snm>De Rycke</snm><fnm>M</fnm></au><au><snm>Thienpont</snm><fnm>B</fnm></au><au><snm>Debrock</snm><fnm>S</fnm></au><au><snm>Staessen</snm><fnm>C</fnm></au><au><snm>Moreau</snm><fnm>Y</fnm></au><au><snm>Fryns</snm><fnm>J-P</fnm></au><au><snm>Van Steirteghem</snm><fnm>A</fnm></au><au><snm>Liebaers</snm><fnm>I</fnm></au><au><snm>Vermeesch</snm><fnm>JR</fnm></au></aug><source>Nucleic acids research</source><pubdate>2006</pubdate><volume>34</volume><fpage>e68</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl336</pubid><pubid idtype="pmpid" link="fulltext">16698960</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Chromosome instability is common in human cleavage-stage embryos.</p></title><aug><au><snm>Vanneste</snm><fnm>E</fnm></au><au><snm>Voet</snm><fnm>T</fnm></au><au><snm>Le Caignec</snm><fnm>C</fnm></au><au><snm>Ampe</snm><fnm>M</fnm></au><au><snm>Konings</snm><fnm>P</fnm></au><au><snm>Melotte</snm><fnm>C</fnm></au><au><snm>Debrock</snm><fnm>S</fnm></au><au><snm>Amyere</snm><fnm>M</fnm></au><au><snm>Vikkula</snm><fnm>M</fnm></au><au><snm>Schuit</snm><fnm>F</fnm></au><au><snm>Fryns</snm><fnm>J-P</fnm></au><au><snm>Verbeke</snm><fnm>G</fnm></au><au><snm>D&apos;Hooghe</snm><fnm>T</fnm></au><au><snm>Moreau</snm><fnm>Y</fnm></au><au><snm>Vermeesch</snm><fnm>JR</fnm></au></aug><source>Nature medicine</source><pubdate>2009</pubdate><volume>15</volume><fpage>577</fpage><lpage>583</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nm.1924</pubid><pubid idtype="pmpid" link="fulltext">19396175</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>High resolution array-CGH analysis of single cells.</p></title><aug><au><snm>Fiegler</snm><fnm>H</fnm></au><au><snm>Geigl</snm><fnm>JB</fnm></au><au><snm>Langer</snm><fnm>S</fnm></au><au><snm>Rigler</snm><fnm>D</fnm></au><au><snm>Porter</snm><fnm>K</fnm></au><au><snm>Unger</snm><fnm>K</fnm></au><au><snm>Carter</snm><fnm>NP</fnm></au><au><snm>Speicher</snm><fnm>MR</fnm></au></aug><source>Nucleic acids research</source><pubdate>2007</pubdate><volume>35</volume><fpage>e15</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl1030</pubid><pubid idtype="pmcid">1807964</pubid><pubid idtype="pmpid" link="fulltext">17178751</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Karyomapping: a universal method for genome wide analysis of genetic disease based on mapping crossovers between parental haplotypes.</p></title><aug><au><snm>Handyside</snm><fnm>AH</fnm></au><au><snm>Harton</snm><fnm>GL</fnm></au><au><snm>Mariani</snm><fnm>B</fnm></au><au><snm>Thornhill</snm><fnm>AR</fnm></au><au><snm>Affara</snm><fnm>N</fnm></au><au><snm>Shaw</snm><fnm>M-A</fnm></au><au><snm>Griffin</snm><fnm>DK</fnm></au></aug><source>Journal of medical genetics</source><pubdate>2010</pubdate><volume>47</volume><fpage>651</fpage><lpage>658</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1136/jmg.2009.069971</pubid><pubid idtype="pmpid" link="fulltext">19858130</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Two color hybridization analysis using high density oligonucleotide arrays and energy transfer dyes</p></title><aug><au><snm>Hacia</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>1998</pubdate><volume>26</volume><fpage>3865</fpage><lpage>3866</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/26.16.3865</pubid><pubid idtype="pmcid">147769</pubid><pubid idtype="pmpid" link="fulltext">9685507</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Whole-genome multiple displacement amplification from single cells.</p></title><aug><au><snm>Spits</snm><fnm>C</fnm></au><au><snm>Le Caignec</snm><fnm>C</fnm></au><au><snm>De Rycke</snm><fnm>M</fnm></au><au><snm>Van Haute</snm><fnm>L</fnm></au><au><snm>Van Steirteghem</snm><fnm>A</fnm></au><au><snm>Liebaers</snm><fnm>I</fnm></au><au><snm>Sermon</snm><fnm>K</fnm></au></aug><source>Nature protocols</source><pubdate>2006</pubdate><volume>1</volume><fpage>1965</fpage><lpage>1970</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nprot.2006.326</pubid><pubid idtype="pmpid" link="fulltext">17487184</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Statistical issues in the analysis of DNA Copy Number Variations.</p></title><aug><au><snm>Wineinger</snm><fnm>NE</fnm></au><au><snm>Kennedy</snm><fnm>RE</fnm></au><au><snm>Erickson</snm><fnm>SW</fnm></au><au><snm>Wojczynski</snm><fnm>MK</fnm></au><au><snm>Bruder</snm><fnm>CE</fnm></au><au><snm>Tiwari</snm><fnm>HK</fnm></au></aug><source>International journal of computational biology and drug design</source><pubdate>2008</pubdate><volume>1</volume><fpage>368</fpage><lpage>395</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1504/IJCBDD.2008.022208</pubid><pubid idtype="pmcid">2747762</pubid><pubid idtype="pmpid">19774103</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Robust Locally Weighted Regression and Smoothing Scatterplots</p></title><aug><au><snm>Cleveland</snm><fnm>WS</fnm></au></aug><source>Journal of the American Statistical Association</source><pubdate>1979</pubdate><volume>74</volume><fpage>829</fpage><lpage>836</lpage><xrefbib><pubid idtype="doi">10.2307/2286407</pubid></xrefbib></bibl><bibl id="B9"><title><p>snapCGH: Segmentation, Normalization and Processing of aCGH Data Users' Guide</p></title><aug><au><snm>Smith</snm><fnm>M</fnm></au><au><snm>Marioni</snm><fnm>J</fnm></au><au><snm>Hardcastle</snm><fnm>T</fnm></au><au><snm>Thorne</snm><fnm>N</fnm></au></aug><url>http://www.bioconductor.org/packages/release/bioc/html/snapCGH.html</url></bibl><bibl id="B10"><title><p>Normalization of array-CGH data: influence of copy number imbalances.</p></title><aug><au><snm>Staaf</snm><fnm>J</fnm></au><au><snm>J&#246;nsson</snm><fnm>G</fnm></au><au><snm>Ringn&#233;r</snm><fnm>M</fnm></au><au><snm>Vallon-Christersson</snm><fnm>J</fnm></au></aug><source>BMC genomics</source><pubdate>2007</pubdate><volume>8</volume><fpage>382</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-8-382</pubid><pubid idtype="pmcid">2190775</pubid><pubid idtype="pmpid" link="fulltext">17953745</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>CGHnormaliter: an iterative strategy to enhance normalization of array CGH data with imbalanced aberrations.</p></title><aug><au><snm>van Houte</snm><fnm>BPP</fnm></au><au><snm>Binsl</snm><fnm>TW</fnm></au><au><snm>Hettling</snm><fnm>H</fnm></au><au><snm>Pirovano</snm><fnm>W</fnm></au><au><snm>Heringa</snm><fnm>J</fnm></au></aug><source>BMC genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>401</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-401</pubid><pubid idtype="pmcid">2748095</pubid><pubid idtype="pmpid" link="fulltext">19709427</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Smoothing waves in array CGH tumor profiles.</p></title><aug><au><snm>van de Wiel</snm><fnm>MA</fnm></au><au><snm>Brosens</snm><fnm>R</fnm></au><au><snm>Eilers</snm><fnm>PHC</fnm></au><au><snm>Kumps</snm><fnm>C</fnm></au><au><snm>Meijer</snm><fnm>GA</fnm></au><au><snm>Menten</snm><fnm>B</fnm></au><au><snm>Sistermans</snm><fnm>E</fnm></au><au><snm>Speleman</snm><fnm>F</fnm></au><au><snm>Timmerman</snm><fnm>ME</fnm></au><au><snm>Ylstra</snm><fnm>B</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><fpage>1099</fpage><lpage>1104</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp132</pubid><pubid idtype="pmpid" link="fulltext">19276148</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>A fast and flexible method for the segmentation of aCGH data.</p></title><aug><au><snm>Ben-Yaacov</snm><fnm>E</fnm></au><au><snm>Eldar</snm><fnm>YC</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><fpage>i139</fpage><lpage>145</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">18689815</pubid></xrefbib></bibl><bibl id="B14"><title><p>Analysis of array CGH data: from signal ratio to gain and loss of DNA regions.</p></title><aug><au><snm>Hup&#233;</snm><fnm>P</fnm></au><au><snm>Stransky</snm><fnm>N</fnm></au><au><snm>Thiery</snm><fnm>J-P</fnm></au><au><snm>Radvanyi</snm><fnm>F</fnm></au><au><snm>Barillot</snm><fnm>E</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><fpage>3413</fpage><lpage>3422</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth418</pubid><pubid idtype="pmpid" link="fulltext">15381628</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA.</p></title><aug><au><snm>Pique-Regi</snm><fnm>R</fnm></au><au><snm>Ortega</snm><fnm>A</fnm></au><au><snm>Asgharzadeh</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><fpage>1223</fpage><lpage>1230</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp119</pubid><pubid idtype="pmcid">2732310</pubid><pubid idtype="pmpid" link="fulltext">19276152</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Joint segmentation, calling, and normalization of multiple CGH profiles.</p></title><aug><au><snm>Picard</snm><fnm>F</fnm></au><au><snm>Lebarbier</snm><fnm>E</fnm></au><au><snm>Hoebeke</snm><fnm>M</fnm></au><au><snm>Rigaill</snm><fnm>G</fnm></au><au><snm>Thiam</snm><fnm>B</fnm></au><au><snm>Robin</snm><fnm>S</fnm></au></aug><source>Biostatistics</source><pubdate>2011</pubdate><volume>12</volume><fpage>413</fpage><lpage>428</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/biostatistics/kxq076</pubid><pubid idtype="pmpid" link="fulltext">21209153</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Circular binary segmentation for the analysis of array-based DNA copy number data.</p></title><aug><au><snm>Olshen</snm><fnm>AB</fnm></au><au><snm>Venkatraman</snm><fnm>ES</fnm></au><au><snm>Lucito</snm><fnm>R</fnm></au><au><snm>Wigler</snm><fnm>M</fnm></au></aug><source>Biostatistics</source><pubdate>2004</pubdate><volume>5</volume><fpage>557</fpage><lpage>572</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/biostatistics/kxh008</pubid><pubid idtype="pmpid" link="fulltext">15475419</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Preprocessing and downstream analysis of microarray DNA copy number profiles.</p></title><aug><au><snm>van de Wiel</snm><fnm>Ma</fnm></au><au><snm>Picard</snm><fnm>F</fnm></au><au><snm>van Wieringen</snm><fnm>WN</fnm></au><au><snm>Ylstra</snm><fnm>B</fnm></au></aug><source>Briefings in bioinformatics</source><pubdate>2011</pubdate><volume>12</volume><fpage>10</fpage><lpage>21</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/bbq004</pubid><pubid idtype="pmpid" link="fulltext">20172948</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms.</p></title><aug><au><snm>Diskin</snm><fnm>SJ</fnm></au><au><snm>Li</snm><fnm>M</fnm></au><au><snm>Hou</snm><fnm>C</fnm></au><au><snm>Yang</snm><fnm>S</fnm></au><au><snm>Glessner</snm><fnm>J</fnm></au><au><snm>Hakonarson</snm><fnm>H</fnm></au><au><snm>Bucan</snm><fnm>M</fnm></au><au><snm>Maris</snm><fnm>JM</fnm></au><au><snm>Wang</snm><fnm>K</fnm></au></aug><source>Nucleic acids research</source><pubdate>2008</pubdate><volume>36</volume><fpage>e126</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn556</pubid><pubid idtype="pmcid">2577347</pubid><pubid idtype="pmpid" link="fulltext">18784189</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Single-cell isolation from cell suspensions and whole genome amplification from single cells to provide templates for CGH analysis.</p></title><aug><au><snm>Geigl</snm><fnm>JB</fnm></au><au><snm>Speicher</snm><fnm>MR</fnm></au></aug><source>Nature protocols</source><pubdate>2007</pubdate><volume>2</volume><fpage>3173</fpage><lpage>3184</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nprot.2007.476</pubid><pubid idtype="pmpid" link="fulltext">18079717</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Accurate single cell 24 chromosome aneuploidy screening using whole genome amplification and single nucleotide polymorphism microarrays.</p></title><aug><au><snm>Treff</snm><fnm>NR</fnm></au><au><snm>Su</snm><fnm>J</fnm></au><au><snm>Tao</snm><fnm>X</fnm></au><au><snm>Levy</snm><fnm>B</fnm></au><au><snm>Scott</snm><fnm>RT</fnm></au></aug><source>Fertility and sterility</source><pubdate>2010</pubdate><volume>94</volume><fpage>2017</fpage><lpage>2021</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.fertnstert.2010.01.052</pubid><pubid idtype="pmpid" link="fulltext">20188357</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Detection of chromosomal structural alterations in single cells by SNP arrays: a systematic survey of amplification bias and optimized workflow.</p></title><aug><au><snm>Iwamoto</snm><fnm>K</fnm></au><au><snm>Bundo</snm><fnm>M</fnm></au><au><snm>Ueda</snm><fnm>J</fnm></au><au><snm>Nakano</snm><fnm>Y</fnm></au><au><snm>Ukai</snm><fnm>W</fnm></au><au><snm>Hashimoto</snm><fnm>E</fnm></au><au><snm>Saito</snm><fnm>T</fnm></au><au><snm>Kato</snm><fnm>T</fnm></au></aug><source>PloS one</source><pubdate>2007</pubdate><volume>2</volume><fpage>e1306</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0001306</pubid><pubid idtype="pmcid">2111048</pubid><pubid idtype="pmpid" link="fulltext">18074030</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Identification of small gains and losses in single cells after whole genome amplification on tiling oligo arrays.</p></title><aug><au><snm>Geigl</snm><fnm>JB</fnm></au><au><snm>Obenauf</snm><fnm>AC</fnm></au><au><snm>Waldispuehl-Geigl</snm><fnm>J</fnm></au><au><snm>Hoffmann</snm><fnm>EM</fnm></au><au><snm>Auer</snm><fnm>M</fnm></au><au><snm>H&#246;rmann</snm><fnm>M</fnm></au><au><snm>Fischer</snm><fnm>M</fnm></au><au><snm>Trajanoski</snm><fnm>Z</fnm></au><au><snm>Schenk</snm><fnm>MA</fnm></au><au><snm>Baumbusch</snm><fnm>LO</fnm></au><au><snm>Speicher</snm><fnm>MR</fnm></au></aug><source>Nucleic acids research</source><pubdate>2009</pubdate><volume>37</volume><fpage>e105</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkp526</pubid><pubid idtype="pmcid">2731907</pubid><pubid idtype="pmpid" link="fulltext">19541849</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>R: A language and environment for statistical computing</p></title><aug><au><cnm>R Development Core team</cnm></au></aug><url>http://www.r-project.org</url></bibl><bibl id="B25"><title><p>A comparison of background correction methods for two-colour microarrays.</p></title><aug><au><snm>Ritchie</snm><fnm>ME</fnm></au><au><snm>Silver</snm><fnm>J</fnm></au><au><snm>Oshlack</snm><fnm>A</fnm></au><au><snm>Holmes</snm><fnm>M</fnm></au><au><snm>Diyagama</snm><fnm>D</fnm></au><au><snm>Holloway</snm><fnm>A</fnm></au><au><snm>Smyth</snm><fnm>GK</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><fpage>2700</fpage><lpage>2707</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm412</pubid><pubid idtype="pmpid" link="fulltext">17720982</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Smoothing by Spline Functions</p></title><aug><au><snm>Reinsch</snm><fnm>CH</fnm></au></aug><source>Numerische Mathematik</source><pubdate>1967</pubdate><volume>10</volume><fpage>177</fpage><lpage>183</lpage><xrefbib><pubid idtype="doi">10.1007/BF02162161</pubid></xrefbib></bibl><bibl id="B27"><title><p>CGHcall: calling aberrations for array CGH tumor profiles</p></title><aug><au><snm>van de Wiel</snm><fnm>Ma</fnm></au><au><snm>Kim</snm><fnm>KI</fnm></au><au><snm>Vosse</snm><fnm>SJ</fnm></au><au><snm>van Wieringen</snm><fnm>WN</fnm></au><au><snm>Wilting</snm><fnm>SM</fnm></au><au><snm>Ylstra</snm><fnm>B</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><fpage>892</fpage><lpage>894</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm030</pubid><pubid idtype="pmpid" link="fulltext">17267432</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data.</p></title><aug><au><snm>Lai</snm><fnm>WR</fnm></au><au><snm>Johnson</snm><fnm>MD</fnm></au><au><snm>Kucherlapati</snm><fnm>R</fnm></au><au><snm>Park</snm><fnm>PJ</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>3763</fpage><lpage>3770</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti611</pubid><pubid idtype="pmcid">2819184</pubid><pubid idtype="pmpid" link="fulltext">16081473</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>