Table 2

The pragmas defined by GVF, in addition to those already defined by GFF3 (gff-version, sequence-region, feature-ontology, attribute-ontology, source-ontology, species, genome-build)

Pragma

Allowed tags

Description


file-version

Comment

This allows the specification of the version of a specific file. What exactly the version means is left undefined, but the tag is provided for the case when an individual's variants are described in GVF and then, at a later date, changes to the data or the software require an update to the file. An increment of the file-version could signify such a change. Any numeric version of file-version is allowed

file-date

Comment

The file-date pragma is included as a method to describe the date when the file was created. The ISO 8601 standard for dates in the form YYYY-MM-DD is required for the value

individual-id

Dbxref, Gender, Population, Comment

This pragma provides details about the individual whose variants are described in the file

##individual-id Dbxref = Coriell:NA18507;Gender = male;Ethnicity = Yoruba; Comment = Yoruba from Ibadan

source-method

Seqid, Source, Type, Dbxref, Comment

This pragma provides details about the algorithms or methodologies used to generate data for a given source in the file. This is used, for example, to document how a particular type of variant was called. A typical use would be to provide a DBxref link to a journal article describing software used for calling the variant data with the given source tag

##source-method Seqid = chr1;Source = MAQ;Type = SNV;Dbxref = PMID:18714091;Comment = MAQ SNV calls;

attribute-method

Seqid, Source, Type, Attribute, Dbxref, Comment

This pragma provides details about algorithms or methodologies for a given attribute tag in the file. This is used to document how a particular type of attribute value (that is, Genotype, Variant_effect) was calculated

##attribute-method Source = SOLiD;Type = SNV;Attribute = Genotype;Comment = Genotype is reported here as determined in the original study

technology-platform

Seqid, Source, Type, Read_length, Read_type, Read_pair_span, Platform_class, Platform_name, Average_coverage. Comment, Dbxref

This pragma provides details about the technologies (that is, sequencing or DNA microarray) used to generate the primary data

##technology-platform Seqid = chr1;Source = AFFY_SNP_6;Type = SNV;Dbxref = URI:http://www.affymetrix.com webcite; Platform_class = SNP_Array;Platform_name = Affymetrix Human SNP Array 6.0;

data-source

Seqid, Source, Type, Dbxref, Data_type, Comment.

This pragma provides details about the source data for the variants contained in this file. This could be links to the actual sequence reads in a trace archive, or links to a variant file in another format that have been converted to GVF

##data-source Source = MAQ;Type = SNV;Dbxref = SRA:SRA008175;Data_type = DNA sequence;Comment = NCBI Short Read Archive http://www.ncbi.nlm.nih.gov/Traces/sra webcite;

phenotype-description

Ontology, Term, Comment

A description of the phenotype of the individual. This pragma can contain either ontology constrained terms, or a free text description of the individual's phenotype or both.

##phenotype-description Ontology = http://www.human-phenotype-ontology.org/human-phenotype-ontology.obo.gz webcite;Term = acute myloid leukemia;Comment = AML relapse;

ploidy

Ontology, Term, Comment

This pragma defines the ploidy for a given genome. This pragma can contain either ontology constrained terms, or a free text description of the individual's ploidy. It is suggested that ontology constrained terms use a subtype of the term PATO:0001374, which includes haploid, diploid, polyploid, triploid etc

##ploidy chr22 1 49691432 diploid

##ploidy chrY 1 57772954 haploid


The pragmas defined by GVF may refer to the entire file or may limit their scope by use of tag-value pairs. For example, if a pragma only applies to SNVs that were called by Gigabayes on chromosome 13, then the tags: Seqid = chr13;Source = Gigabayes;Type = SNV would indicate the scope. The Dbxref tag within a GVF pragma takes values of the form 'DBTAG:ID' and provides a reference for the information given by the pragma whether that be the location of sequence files or a link to a paper describing a method. Tags beginning with uppercase letters are reserved for future use within the GFF/GVF specification, but applications are free to provide additional tags beginning with lower case letters.

Reese et al. Genome Biology 2010 11:R88   doi:10.1186/gb-2010-11-8-r88

Open Data