Table 2

Open reading frame predictiona

T. turgidum

T. urartu


Contigs (n)

140,118

86,247

Non-wheat sequencesb (eliminated) (n)

558

518


Wheat protein coding sequences

BLASTX, E-value cutoff 1e-3

96,244

59,439

Contigs with a Pfam domain (1e-3)

59,917

39,965

Contig sequences without BLASTX (1e-3) or Pfam (1e-3)

42,999

26,070


Predicted open reading frames

Predicted ORFs (non-redundant, >30 amino acids)

76,570

43,014

Fulllength

32,548

22,868

Missing 5' end

26,723

12,225

Missing 3' end

12,792

5,376

Missing 5' and 3' end

4,507

2,545

Putative pseudogenes (frameshift and/or premature stop codon)

9,937

5,208


Putative fused transcripts

Contigs with BLASTX on inconsistent strand

4,376

3,628

Contigs with >1 predicted ORFs (>30 amino acids, no repetitive elements, not a pseudogene)

2,164

1,349


Putative fused transcripts (excluding overlaps) (n)

6,409

4,866


aOpen reading frames were predicted with a comparative genomics approach using the findorfprogram and BLASTX alignments (E-value cutoff 1e-5) between contigs and proteomes of barley, Brachypodium, rice, maize, sorghum, and Arabidopsis.

bNon-wheat sequences were identified based on taxonomic distribution of top 10 BLASTX hits against nr.

Krasileva et al. Genome Biology 2013 14:R66   doi:10.1186/gb-2013-14-6-r66

Open Data