Additional data file 2.

Provided is the actual ngLOC dataset. It is a FASTA formatted file. The header format for each sequence is > SP_name loc [/loc2], where SP_name is the Swiss-Prot name of the protein sequence, from release 50.0, loc is a single letter representing subcellular localization for this sequence, and/loc2 is an optional field that exists only if the sequence is multi-localized. The letter codes for subcellular localization are as follows: C (CYT), cytoplasm; K (CSK), cytoskeleton[E (END), endoplasmic reticulum; S (EXC), extracellular/secreted; G (GOL), Golgi; L (LYS), lysosome; M (MIT), mitochondria; N (NUC), nucleus; P (PLA), plasma membrane; X (POX), perixosome.

King and Guda Genome Biology 2007 8:R68   doi:10.1186/gb-2007-8-5-r68