Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Research

Finishing the finished human chromosome 22 sequence

Charlotte G Cole1, Owen T McCann1, John E Collins1, Karen Oliver1, David Willey1, Susan M Gribble1, Fengtang Yang1, Karen McLaren1, Jane Rogers1, Zemin Ning1, David M Beare1 and Ian Dunham12*

Author Affiliations

1 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

2 EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

For all author emails, please log on.

Genome Biology 2008, 9:R78  doi:10.1186/gb-2008-9-5-r78

Published: 13 May 2008

Abstract

Background

Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated approximately 28 Mb of euchromatin. While these gaps constitute only approximately 1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences.

Results

We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome (BAC) libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition, we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126 kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and one pseudogene.

Conclusion

Thus, we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence.