Table 7

Information extracted from different data sources

Data source (version)

Information extracted (for each gene or locus)

Number of genes


Obtained

Nonredundant


Ensembl (Build 31)

Gene name, chromosome or contig, start and end positions, strand (transcription direction), exons, gene-product (including function name(s) or description(s), synonyms and EC number(s)), cross references (IDs) to other databases (SwissProt, HUGO, PDB, GO, RefSeq, OMIM, Entrez, SPTREMBL, EMBL, LocusLink).

24,847

LocusLink (03/29/2003)

Gene name, chromosome, gene product (function name or description), function synonyms, EC number(s), gene and protein comments, cross references (IDs) to other databases (Entrez, UCSC Genome, RefSeq, GO, OMIM, UniGene, PubMed)

18,880

3,936

GenBank NC_001807 (mitochondrion)

Gene name, start and end positions, transcription direction, gene product (function name or description)

35


Functional information in Ensembl had to be extensively parsed to extract multiple functions, EC numbers, and/or synonyms. The 'nonredundant' column shows the number of genes from LocusLink that had no corresponding gene in the other two data sources (Ensembl and GenBank).

Romero et al. Genome Biology 2004 6:R2   doi:10.1186/gb-2004-6-1-r2

Open Data