Open Access Method

hzAnalyzer: detection, quantification, and visualization of contiguous homozygosity in high-density genotyping datasets

Todd A Johnson12, Yoshihito Niimura2, Hiroshi Tanaka3, Yusuke Nakamura4 and Tatsuhiko Tsunoda1*

Author Affiliations

1 Laboratory for Medical Informatics, Center for Genomic Medicine, RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa-ken, 230-0045, Japan

2 Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental University, Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan

3 Department of Bioinformatics, School of Biomedical Science, Tokyo Medical and Dental University, Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan

4 Human Genome Center, Institute of Medical Science, University of Tokyo, Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan

For all author emails, please log on.

Genome Biology 2011, 12:R21  doi:10.1186/gb-2011-12-3-r21

Published: 11 March 2011

Additional files

Additional file 1:

Figure S1. Genome-wide plot of greater confidence homozygous segments. The chromosomal positions of homozygous segments with length ≥MISLchr were plotted for all 269 samples (arrayed along the y-axis). The relative SNP density compared to the maximum for that chromosome is plotted at the top of each panel. Homozygous segments were color-coded depending on different status types. Red lines, homozygous segments ≥MISLchr; green lines, putative autozygous segments (MAD score >10); yellow lines, ≤0.2 SNP/kb; blue line, high missingness (no-call rate >0.05); orange lines, sample level CNVs.

Format: PDF Size: 1.6MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Supplementary Tables 1 to 8. Table S1: minimum inclusive segment lengths (bp) calculated for each chromosome for each population and across all populations. Table S2a-d: putative autozygous segment coordinates for YRI, CEU, CHB, and JPT, respectively. Segment coordinates and parameters are listed for segments with a segment length median MAD score greater than 10. Table S3a-d: tables of all detected outlier peaks and derived peak statistics for YRI, CEU, CHB, and JPT, respectively. Outlier peaks by peak height (extAUC) were determined separately for each population and chromosome and statistics extracted for different peak features. Table S4a-d: tables of all detected outlier peak regions and summary of underlying peak statistics for YRI, CEU, CHB, and JPT, respectively. Outlier peaks that were directly adjacent to each other and were not well separated (Valleys > 0.5 × Peak height) were merged to define peak regions, and statistics were summarized across the underlying peaks. Table S5a,b: peaks with high-ranking extAUC and high Fst/θ between CHB and JPT represent extended haplotypes with high frequency differences. Peaks were extracted that had both high-ranking extAUC values and extreme Fst/θ values (autosomes >0.0360 or chromosome X >0.0538) in both/between CHB and JPT. For each peak, approximate haplotype frequencies were estimated using the allele frequencies of the differentiated loci. Table S6a-d: candidate fixed areas: genomic areas of contiguous loci with evidence for fixation. RCL0 were selected using thresholds set as the first quartiles of RCL0 SNP counts and RCL0 centimorgan extent values across all populations. Selected RCL0 were then intersected with a dataset of genes and canonical coding regions as well as data from previous reports by Kimura et al. [27], Sabeti and colleagues [5,28], Tang et al. [29], and O'Reilly et al. [30]. Table S7a-c: outlier peak regions intersecting candidate fixed areas in CEU, CHB, or JPT. Outlier peak regions were intersected with the set of candidate fixed areas presented in Table S6a-d. These peak regions were then intersected with a dataset of genes and canonical coding regions as well as data from previous reports by Kimura et al. [27], Sabeti and colleagues [5,28], Tang et al. [29], and O'Reilly et al. [30]. Table S8: combined candidate fixed regions. Coordinates of candidate fixed peak regions from Table S7a-c that overlapped were merged into a set of combined candidate fixed regions. These regions were annotated with genes that directly intersected any candidate fixed areas as well as the number of overlapping detected regions in each of the previous reports by Kimura et al. [27], Sabeti and colleagues [5,28], Tang et al. [29], and O'Reilly et al. [30].

Format: XLS Size: 3.2MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Figure S2. Chromosome profiles of percent coverage by putative autozygous segments. Putative autozygous segments were defined as homozygous segments with length-based MAD score >10 and the percent coverage of each chromosome calculated for each sample. Pages are labelled with population name and gender at the top. Sample profiles on each page are ordered by increasing genome-wide coverage. The y-axis maximum limit is set to 5.0%. For coverage values ≥5.0%, the plotted points extend off the top of the plot and the percent value is printed underneath the peak.

Format: PDF Size: 1.4MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Figure S3. The local centimorgan extent of homozygosity across the genome at the 75th percentile. Homozygous extent values are plotted in centimorgans for the 75th percentile for each sample population. Physical distance (base pairs) was converted into genetic distance (centimorgans) using chromosome arm averaged recombination rates. To reduce the large number of plotted datapoints, we smoothed these values using smooth splines and then down-sampled the predicted values. The y-axis is set dynamically to the highest observed peak for a particular chromosome.

Format: PDF Size: 2.2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Figure S4a. Genome-wide visualization of PEmat and extAUC values for YRI. PEmat (cM) matrix values were scaled based on a maximum value of 2 cM, converted into grayscale levels, and plotted by chromosome. Cells with values ≥2 cM were set to black to compress and standardize the dynamic range. Red line: smoothed extAUC values were down-sampled. The scale for extAUC values is set separately to the maximum value observed across all autosomes or on chromosome X. Chromosomes are ordered by chromosomal base-pair length.

Format: PDF Size: 3MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Figure S4b. Genome-wide visualization of PEmat and extAUC values for CEU. PEmat (cM) matrix values were scaled based on a maximum value of 2 cM, converted into grayscale levels, and plotted by chromosome. Cells with values ≥2 cM were set to black to compress and standardize the dynamic range. Red line: smoothed extAUC values were down-sampled. The scale for extAUC values is set separately to the maximum value observed across all autosomes or on chromosome X. Chromosomes are ordered by chromosomal base pair length.

Format: PDF Size: 3.8MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Figure S4c. Genome-wide visualization of PEmat and extAUC values for CHB. PEmat (cM) matrix values were scaled based on a maximum value of 2 cM, converted into grayscale levels, and plotted by chromosome. Cells with values ≥2 cM were set to black to compress and standardize the dynamic range. Red line: smoothed extAUC values were down-sampled. The scale for extAUC values is set separately to the maximum value observed across all autosomes or on chromosome X. Chromosomes are ordered by chromosomal base pair length.

Format: PDF Size: 4.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 8:

Figure S4d. Genome-wide visualization of PEmat and extAUC values for JPT. PEmat (cM) matrix values were scaled based on a maximum value of 2 cM, converted into grayscale levels, and plotted by chromosome. Cells with values ≥2 cM were set to black to compress and standardize the dynamic range. Red line: smoothed extAUC values were down-sampled. The scale for extAUC values is set separately to the maximum value observed across all autosomes or on chromosome X. Chromosomes are ordered by chromosomal base pair length.

Format: PDF Size: 4.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

Figure S5. Comparison of the extent and frequency of homozygous segments with haplotypes underlying extAUC peaks. Analysis of the consistency of the homozygous extent distribution and length and frequency of haplotypes for extAUC peaks in YRI, CEU, CHB, and JPT. Minimum segment length (Extentmin), expected haplotype frequency (Freqhap-exp), and maximum haplotype frequency (Freqhap-max) were calculated as diagrammed in Figure 5 for peaks dichotomized into non-outlier and outlier peaks. Data points were colored using a two-dimesnional density estimate using R's function densCols with nbin = 1,024.

Format: PDF Size: 231KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Figure S6. Majority of outlier peaks intersect with similarly high-ranking extAUC values in other populations. Each population's chromosome's extAUC values were used as input to R's ecdf function to substitute a rank value for each locus's extAUC value. For each outlier peak, locus positions were extracted, the maximum observed extAUC rank value for those positions in each of the other populations determined, and the distribution of those rank values summarized using boxplot statistics. Outlier points are randomly jittered from left to right to reduce overlap.

Format: PDF Size: 52KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Figure S7. Phased haplotype plots for two regions with both high-ranking extAUC and high Fst/θ values between populations. Phased haplotypes were plotted for two example regions exhibiting high cross-population extAUC values as well as high population differentiation: page 1, Chr X:62.7-67Mb; page 2, Chr 14:65.4-67 Mb.

Format: PDF Size: 227KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

Figure S8. High-ranking extAUC values and high Fst/θ between East Asian population samples identify peaks intersecting multi-locus haplotypes with high frequency differences. Peaks were selected that had high-ranking extAUC values in the two groups (≥90th percentile) as well as extreme Fst/θ values (Chr X Fst/θ >0.0538, autosome Fst/θ >0.0360). The peaks were sorted in decreasing order using the proportion of loci with extreme Fst/θ values. The top five peaks for CHB and JPT are shown.

Format: PDF Size: 222KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 13:

Figure S9. Phased haplotype plots in combined fixation candidate regions. Phased haplotypes were plotted for combined fixation candidate regions that are mentioned in the Discussion. These include three regions on chromosome X that were not reported in the other examined datasets: page 1, Chr X:104.2-105.5 Mb; page 2, Chr X:113.7-114.4 Mb; page 3, Chr X:126.1-127.7 Mb, and one region in JPT intersecting the EXOC6B gene; page 4, Chr 2:71.9-73.1 Mb.

Format: PDF Size: 312KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data