wikipedia: FASTA format
NCBI's FASTA format description
FASTA format is a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
simple examples:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY
example of a multiple sequence FASTA file:
>SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
GenBank | gi|gi-number|gb|accession|locus |
EMBL Data Library | gi|gi-number|emb|accession|locus |
DDBJ, DNA Database of Japan | gi|gi-number|dbj|accession|locus |
NBRF PIR | pir||entry |
Protein Research Foundation | prf||name |
SWISS-PROT | sp|accession|name |
Brookhaven Protein Data Bank (1) | pdb|entry|chain |
Brookhaven Protein Data Bank (2) | entry:chain|PDBID|CHAIN|SEQUENCE |
Patents | pat|country|number |
GenInfo Backbone Id | bbs|number |
General database identifier | gnl|database|identifier |
NCBI Reference Sequence | ref|accession|locus |
Local Sequence identifier | lcl|identifier |
Anm: Die gi-Nummer ist eine Abfolge von Ziffern, die einen Datenbankeintrag des NCBI markiert.
The nucleic acid codes supported are:
Nucleic Acid Code | Meaning |
---|---|
A | Adenosine |
C | Cytidine |
G | Guanine |
T | Thymidine |
U | Uracil |
R | G A (puRine) |
Y | T C (pYrimidine) |
K | G T (Ketone) |
M | A C (aMino group) |
S | G C (Strong interaction) |
W | A T (Weak interaction) |
B | G T C (not A) (B comes after A) |
D | G A T (not C) (D comes after C) |
H | A C T (not G) (H comes after G) |
V | G C A (not T, not U) (V comes after U) |
N | A G C T (aNy) |
X | masked |
- | gap of indeterminate length |
The amino acid codes supported are:
Amino Acid Code | Meaning |
---|---|
A | Alanine |
B | Aspartic acid or Asparagine |
C | Cysteine |
D | Aspartic acid |
E | Glutamic acid |
F | Phenylalanine |
G | Glycine |
H | Histidine |
I | Isoleucine |
K | Lysine |
L | Leucine |
M | Methionine |
N | Asparagine |
P | Proline |
Q | Glutamine |
R | Arginine |
S | Serine |
T | Threonine |
U | Selenocysteine |
V | Valine |
W | Tryptophan |
Y | Tyrosine |
Z | Glutamic acid or Glutamine |
X | any |
* | translation stop |
- | gap of indeterminate length |
Readseq for converting sequence formats to FASTA
Nexus to Fasta converter
GenBank to Fasta conventer