wikipedia: FASTA format
NCBI's FASTA format description
FASTA format is a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
simple examples:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY
example of a multiple sequence FASTA file:
>SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2 SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
| GenBank | gi|gi-number|gb|accession|locus |
| EMBL Data Library | gi|gi-number|emb|accession|locus |
| DDBJ, DNA Database of Japan | gi|gi-number|dbj|accession|locus |
| NBRF PIR | pir||entry |
| Protein Research Foundation | prf||name |
| SWISS-PROT | sp|accession|name |
| Brookhaven Protein Data Bank (1) | pdb|entry|chain |
| Brookhaven Protein Data Bank (2) | entry:chain|PDBID|CHAIN|SEQUENCE |
| Patents | pat|country|number |
| GenInfo Backbone Id | bbs|number |
| General database identifier | gnl|database|identifier |
| NCBI Reference Sequence | ref|accession|locus |
| Local Sequence identifier | lcl|identifier |
Anm: Die gi-Nummer ist eine Abfolge von Ziffern, die einen Datenbankeintrag des NCBI markiert.
The nucleic acid codes supported are:
| Nucleic Acid Code | Meaning |
|---|---|
| A | Adenosine |
| C | Cytidine |
| G | Guanine |
| T | Thymidine |
| U | Uracil |
| R | G A (puRine) |
| Y | T C (pYrimidine) |
| K | G T (Ketone) |
| M | A C (aMino group) |
| S | G C (Strong interaction) |
| W | A T (Weak interaction) |
| B | G T C (not A) (B comes after A) |
| D | G A T (not C) (D comes after C) |
| H | A C T (not G) (H comes after G) |
| V | G C A (not T, not U) (V comes after U) |
| N | A G C T (aNy) |
| X | masked |
| - | gap of indeterminate length |
The amino acid codes supported are:
| Amino Acid Code | Meaning |
|---|---|
| A | Alanine |
| B | Aspartic acid or Asparagine |
| C | Cysteine |
| D | Aspartic acid |
| E | Glutamic acid |
| F | Phenylalanine |
| G | Glycine |
| H | Histidine |
| I | Isoleucine |
| K | Lysine |
| L | Leucine |
| M | Methionine |
| N | Asparagine |
| P | Proline |
| Q | Glutamine |
| R | Arginine |
| S | Serine |
| T | Threonine |
| U | Selenocysteine |
| V | Valine |
| W | Tryptophan |
| Y | Tyrosine |
| Z | Glutamic acid or Glutamine |
| X | any |
| * | translation stop |
| - | gap of indeterminate length |
Readseq for converting sequence formats to FASTA
Nexus to Fasta converter
GenBank to Fasta conventer