Table 2

Match of contigs and singletons to known databases

Match level (ID/Sbj)

Contigs

Singletons


UniProt

NcRNAdb

UniProt

NcRNAdb


M0 (98%/100%)

1,982

21

173

6

M1 (95%/95%)

1,304

18

101

12

M2 (85%/90%)

2,517

72

236

20

M3 (70%/70%)

3,480

-

749

-

M4 (60%/50%)

3,603

-

1,355

-

M5 (20%/20%)

11,973

-

12,337

-


The table list the number of hits to given databases with various levels of matching for clusters and singletons. The cutoffs for given match level are indicated in terms of alignment identity (ID) and subject coverage (Sbj) for UniProt and the noncoding RNA databases (ncRNAdb). Only match levels up to M2 (alignment length larger than 30 nucleotides) for ncRNAs are included (counting each contig/singleton only once) and the matches have been cleaned for tRNAs because these appears to be the most frequent RNAs from contamination, such as E. coli. A curated list of ncRNAs for levels M0 and M1 can found in Additional data file 1 (Table S1). Also see text for details. It should be noted that a few conreads match the same UniProt ID. This can be due to phylogenetic decomposition or single reads not being assembled. The total number of contigs was 48,629; the number of singletons was 73,171.

Gorodkin et al. Genome Biology 2007 8:R45   doi:10.1186/gb-2007-8-4-r45

Open Data