Table 1

Extent of spurious alignment due to unique or best-guess mapping strategies

Condition

Read length

SNP reads

Total aligned

Discarded

Misaligned

Misaligned/Aligned


Best, no error

30

7,413,324

99.96%

0.04%

12.11%

12.11%

60

14,826,840

99.81%

0.19%

3.02%

3.02%

90

22,240,926

99.55%

0.45%

1.15%

1.15%

120

29,654,088

99.21%

0.79%

0.66%

0.67%

Uni, no error

30

7,413,324

74.94%

25.06%

0.00%

0.00%

60

14,826,840

89.20%

10.80%

0.00%

0.00%

90

22,240,836

93.74%

6.26%

0.00%

0.00%

120

29,654,088

94.98%

5.02%

0.00%

0.00%

Best, 1% error

30

4,388,175

96.40%

3.60%

12.53%

13.00%

60

2,771,304

86.33%

13.67%

2.99%

3.46%

90

2,252,588

74.20%

20.62%

1.02%

1.38%

120

2,132,976

62.14%

37.86%

0.51%

0.82%

Uni, 1% error

30

4,388,175

72.98%

27.02%

0.07%

0.09%

60

2,771,304

78.01%

21.99%

0.06%

0.08%

90

2,252,588

70.34%

29.66%

0.03%

0.05%

120

2,132,976

59.82%

40.18%

0.02%

0.03%


Mapping of reads that overlap synthetic SNPs on human chromosome 1 was performed allowing up to k = 2 mismatches per read under four mapping conditions (best-guess without base-call error; best-guess with 1% base-call error; unique without error; unique with 1% base-call error) and four read lengths (30, 60, 90, and 120 nucleotides). Total mapped SNP reads and percentages of aligned, discarded, misaligned, and the ratio of misaligned to aligned reads are shown for each experiment. A read is considered misaligned if it does not overlap the SNP locus used to generate the read. See Figure 1b for a description of mapping strategies.

Simola and Kim Genome Biology 2011 12:R55   doi:10.1186/gb-2011-12-6-r55

Open Data