Nucleic acid translation

The IUPAC-IUB Encoding for Nucleic Acids

Value Symbol Name
65 A Adenine
66 B G or T or C
67 C Cytosine
68 D G or A or T
71 G Guanine
72 H A or C or T
75 K G or T
77 M A or C
78 N A or G or C or T
82 R G or A
83 S G or C
84 T Thymine
86 V G or C or A
87 W A or T
89 Y T or C

Genetic Codes

The known Genetic codes are tabulated in this document, compiled by Andrzej Elzanowski and Jim Ostell of the NCBI.

During a search of a nucleic acid database, Mascot uses the taxonomy of each entry to choose the correct genetic code. If no taxonomy information is present, it defaults to the standard code. Taxonomy can also be defined at a database level, to handle species specific databases such as EST_human.

In general, the code is different for mitochondrial and nuclear proteins. Although Mascot could try to determine whether a database entry is mitochondrial by performing a keyword search of the FASTA description, this is unreliable. In any case, mitochondrial proteins will usually represent only a very small fraction of the entries in any comprehensive database. The most important requirement is to use the correct code for a database that is specifically mitochondrial proteins. The solution adopted in Mascot is to include a flag in the taxonomy definition to specify whether nuclear or mitochondrial codes should be used.