Multiple sequence alignment of the 2OG-Fe(II) dioxygenase superfamily. Individual protein families are separated by blank lines and a brief description of each family is given to the right of the alignment. The numbers at the ends of the alignment indicate the position of the first and last of the aligned residues in the respective protein sequences. The consensus secondary structure is shown above the alignment in uppercase letters. It was derived by taking those elements that are shared by the predicted structures of individual families and the experimentally determined structures; H indicates α helix and E indicates extended conformation (β strand). The lowercase letters represent extensions of the secondary structure elements that are seen in some, but not all, members of the superfamily. The conserved amino-terminal extensions that are specific only to a given family are separated from the rest of the alignment by vertical lines. The coloring of the alignment columns is according to the 85% consensus that is shown underneath the alignment and includes the following categories of amino acid residues: h,hydrophobic; l, aliphatic; a, aromatic (Y, F, W, H, L, I, V, M, A, all shaded yellow); s, small (S, A, G, T, V, P, N, H, D, shaded blue); b, big (K, R, E, Q, W, F, Y, L, M, I, shaded gray); +, positively charged (K, R, H; colored magenta). The (predicted) catalytic residues are indicated by asterisks and with reverse red shading. The proteins are designated by the protein/gene name, the species abbreviation and the gene identification (GI) number. Protein abbreviations are: CAS, clavaminic acid synthase; DAOCS, deacetoxycephalosporin C synthetase; EFE, ethylene-forming enzyme; FLAS, flavonol synthase; Ga20Ox, giberellin 20-oxidase; IPNS, isopenicillin N synthase; LDOX, leucoanthocyanidin hydroxylase; Lep, leprecan; P4HA, prolyl-4-hydroxylase; PLO, lysyl hydroxylase; SanF and SanC, enzymes involved in nikkomycin biosynthesis. The remaining names are the standard names of the genes that encode the respective proteins. Species abbreviations: At, Arabidopsis thaliana; Bb, Borrelia burgdorferi; Cc, Caulobacter crescentus; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Dm, Drosophila melanogaster; Ec, Escherichia coli; Em, Emericella nidulans; Hs, Homo sapiens; Lc, Lysobacter lactamgenus; Le, Lycopersicon esculentum; Mtu, Mycobacterium tuberculosis; Nc, Neurospora crassa; Pa, Pseudomonas aeruginosa; Pet, Petunia hybrida; Rr, Rattus rattus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Sot, Solanum tuberosum; Scoe, Streptomyces coelicolor; Scan, Streptomyces ansochromogenes; Scla, Streptomyces clavuligerus; Ssp, Synechocystis; Vc, Vibrio cholerae; ASPV, apple stem pitting virus; ACLSV, apple chlorotic leaf spot virus; BSV, blueberry scorch virus; GLV, garlic latent virus; GVA, grapevine virus A; PBCV, Parameciumbursaria chlorella virus; PMV, papaya mosaic virus; SHVX, shallot virus X.
Aravind and Koonin Genome Biology 2001 2:research0007.1 doi:10.1186/gb-2001-2-3-research0007