Table 3

Description of the HapMap and the 1000 Genomes Project gold standards used in this study

HapMap

1000GP Whole Genome


Sample

Positions

Variants

Heterozygous variants

Positions

Variants

Heterozygous variants


CEU-D

42,964

11,558

6,568

24,926,557

14,605

9,595

CEU-M

42,967

11,455

6,460

7,038,292

3,489

2,514

CEU-F

43,049

11,461

6,498

12,227,792

6,030

4,012

YRI-D

43,161

12,320

7,041

25,818,250

18,730

12,569

YRI-M

43,219

12,205

6,843

2,356,922

1,136

831

YRI-F

43,246

12,202

6,734

11,322,826

6,393

4,052


Both gold standards comprise a set of positions within CCDS for each of the six samples for which genotypes are given. We use the gold standards to compare the given genotypes to the genotype calls we make from our capture data over the same positions. The positions for the HapMap gold standard are taken to be the CCDS positions that have been successfully genotyped by the HapMap project using SNP arrays. The positions for the 1000 Genomes Project gold standard are the ones for which we were able to obtain high confidence (minimum consensus quality of 100) genotype calls based on the 1000 Genomes Project trio pilot sequence data. For both gold standards, the Positions column specifies the number of positions in the gold standard for each of the six samples, the Variants column specifies the number of gold standard genotypes that differ from the reference allele at the corresponding position, and the Heterozygous variants column specifies the number of heterozygous gold standard genotypes.

Parla et al. Genome Biology 2011 12:R97   doi:10.1186/gb-2011-12-9-r97

Open Data