Table 1

Status of DGCr1 and DGCr2 clones

DGCr1

DGCr2

Total


Clones in each release

5,849

5,061

10,910

Clones stopped while in progress*

148

739

887

Incorrect clone

0

40

40

Co-ligated inserts

13

493

506

No poly(A)

9

97

106

Transposable element (TE)

11

71

82

Incomplete coding sequence

115

38

153

Candidate clones to be sequenced

5,701

4,322

10,023

Submitted to GenBank

5,291

3,479

8,770

Clones in progress

410

843

1,253


*Quality-control analysis was carried out on clones during the sequencing process. Initial quality-control analysis was carried out for DGCr1 clones before full-length sequencing and for DGCr2 clones during the initial shotgun phase. This difference accounts for the different frequencies of error types observed in the DGCr1 and DGCr2. For example, the DGCr1 3' ESTs were generated before adding the clones to the sequencing pipeline allowing us to eliminate co-ligated clones and clones without poly(A) tails. Conversely, the DGCr2 has fewer clones with incomplete coding sequences because the DGCr2 clones were selected by aligning ESTs to the annotated genomic sequence, providing a more reliable way of selecting clones with complete ORFs than the inter se clustering of ESTs used to select the DGCr1. Clones were removed from finishing if they: were the incorrect clone as revealed by their 5'-end sequence; consisted of two cDNA molecules ligated into the same plasmid vector, as indicated by their 5'- and 3'-end reads aligning more than 300 kb apart in the genome; did not contain a poly(A) tract at their 3' end; corresponded to a member of the transposable element data set [20]; or did not extend to the ATG start site of the corresponding predicted protein in the Release 2 CDS data set. Each clone submitted to GenBank has a contiguous sequence with a phrap estimated error rate of not more than one error per 50,000 bases. Additionally, each individual base has a phred [32,33] quality score of 25 or higher. An exception to these rules was made for 475 clones from the DCGr1 clone set that were submitted to GenBank before we increased our error rate standard from one in 10,000 to one in 50,000. These clones are undergoing additional sequencing to improve their quality to meet the higher standard.

Stapleton et al. Genome Biology 2002 3:research0080.1-0080.8   doi:10.1186/gb-2002-3-12-research0080

Open Data