====== FASTA ======
**[[http://en.wikipedia.org/wiki/FASTA_format|wikipedia: FASTA format]]**\\
[[http://www.ncbi.nlm.nih.gov/blast/fasta.shtml|NCBI's FASTA format description]]\\

\\
FASTA format is a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.


===== format information =====
  * text based
  * no standard file extension for a text file containing FASTA formatted sequences. FASTA format files often have file extensions like .fa, .mpfa, .fna, .fsa, .fas or .fasta

===== Data type handled =====
  * nucleic acid sequences
  * peptide sequences


===== file format =====
  * begins with a single-line description, followed by lines of sequence data
  * It is recommended that all lines of text be shorter than 80 characters
  * The sequence ends if another line starting with a ">" appears (this indicates the start of another sequence
**simple examples:**
<code>
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
</code>

example of a multiple sequence FASTA file:
<code>
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
</code>

=== Header line ===
  * begins with ">"
  * word following is the identifier and/or name of the sequence (optional)
  * rest of the line is the description (optional)
  * no space between the ">" and the first letter of the identifier
  * header line may contain more than one header separated by a ^A (Control-A) character

\\
  * Sequence identifiers:
    * Many different sequence databases use standardized headers, which helps when automatically extracting information from the header
    * NCBI defined a standard for the unique identifier 
    * they do not give a definitive description of the FASTA defline format, an attempt to create such a format: 

| GenBank                          | ''gi|gi-number|gb|accession|locus'' |
| EMBL Data Library                | ''gi|gi-number|emb|accession|locus'' |
| DDBJ, DNA Database of Japan      | ''gi|gi-number|dbj|accession|locus'' |
| NBRF PIR                         | ''pir||entry'' |
| Protein Research Foundation      | ''prf||name'' |
| SWISS-PROT                       | ''sp|accession|name'' |
| Brookhaven Protein Data Bank (1) | ''pdb|entry|chain'' |
| Brookhaven Protein Data Bank (2) | ''entry:chain|PDBID|CHAIN|SEQUENCE'' |
| Patents                          | ''pat|country|number'' |
| GenInfo Backbone Id              | ''bbs|number'' |
| General database identifier      | ''gnl|database|identifier'' |
| NCBI Reference Sequence          | ''ref|accession|locus'' |
| Local Sequence identifier        | ''lcl|identifier'' |
//Anm: Die gi-Nummer ist eine Abfolge von Ziffern, die einen Datenbankeintrag des NCBI markiert.//

\\
=== Sequence representation ===
  * After the header line and comments
  * each line of a sequence should have fewer than 80 characters
  * Sequences may be protein sequences or nucleic acid sequences
  * can contain gaps or alignment characters 
  * Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: 
    * lower-case letters are accepted and are mapped into upper-case
    * a single hyphen or dash can be used to represent a gap character
    * in amino acid sequences: U and * are acceptable letters
  * Numerical digits are not allowed but are used in some databases to indicate the position in the sequence

\\
The nucleic acid codes supported are:

^ Nucleic Acid Code ^  Meaning  ^  
|       A           | Adenosine |
|       C           | Cytidine | 
|       G           | Guanine | 
|       T           | Thymidine | 
|       U           | Uracil |
|       R           | G A (puRine) | 
|       Y           | T C (pYrimidine) | 
|       K           | G T (Ketone) | 
|       M           | A C (aMino group) |
|       S           | G C (Strong interaction) |
|       W           | A T (Weak interaction) | 
|       B           | G T C (not A) (B comes after A) | 
|       D           | G A T (not C) (D comes after C) |
|       H           | A C T (not G) (H comes after G) |
|       V           | G C A (not T, not U) (V comes after U) |
|       N           | A G C T (aNy) | 
|       X           | masked |
|       -           | gap of indeterminate length |

\\
The amino acid codes supported are:

^ Amino Acid Code ^ Meaning ^   
|       A         | Alanine |
|       B         | Aspartic acid or Asparagine |
|       C         | Cysteine |
|       D         | Aspartic acid |
|       E         | Glutamic acid |
|       F         | Phenylalanine |
|       G         | Glycine |
|       H         | Histidine |
|       I         | Isoleucine | 
|       K         | Lysine |
|       L         | Leucine |
|       M         | Methionine |
|       N         | Asparagine |
|       P         | Proline | 
|       Q         | Glutamine |
|       R         | Arginine |
|       S         | Serine |
|       T         | Threonine |
|       U         | Selenocysteine |
|       V         | Valine |
|       W         | Tryptophan | 
|       Y         | Tyrosine |
|       Z         | Glutamic acid or Glutamine |
|       X         | any | 
|       *         | translation stop | 
|       -         | gap of indeterminate length| 


===== converter =====
[[http://iubio.bio.indiana.edu/soft/molbio/readseq/|Readseq]] for converting sequence formats to FASTA \\
[[http://www.bugaco.com/bioinf/|Nexus to Fasta converter]]\\
[[http://gp2fasta.ovh.org/|GenBank to Fasta conventer]]