fasta
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
fasta [2007/12/17 15:35] – heidi | fasta [2008/07/22 13:31] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== FASTA ====== | ====== FASTA ====== | ||
+ | **[[http:// | ||
+ | [[http:// | ||
+ | |||
+ | \\ | ||
FASTA format is a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. | FASTA format is a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. | ||
+ | |||
===== format information ===== | ===== format information ===== | ||
* text based | * text based | ||
+ | * no standard file extension for a text file containing FASTA formatted sequences. FASTA format files often have file extensions like .fa, .mpfa, .fna, .fsa, .fas or .fasta | ||
===== Data type handled ===== | ===== Data type handled ===== | ||
* nucleic acid sequences | * nucleic acid sequences | ||
* peptide sequences | * peptide sequences | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
===== file format ===== | ===== file format ===== | ||
* begins with a single-line description, | * begins with a single-line description, | ||
- | * description line: | ||
- | * ">" | ||
- | * word following is the identifier of the sequence (optional) | ||
- | * rest of the line is the description (optional) | ||
- | * no space between the ">" | ||
* It is recommended that all lines of text be shorter than 80 characters | * It is recommended that all lines of text be shorter than 80 characters | ||
* The sequence ends if another line starting with a ">" | * The sequence ends if another line starting with a ">" | ||
- | **simple | + | **simple |
< | < | ||
> | > | ||
Line 27: | Line 34: | ||
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX | GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX | ||
IENY | IENY | ||
+ | </ | ||
+ | |||
+ | example of a multiple sequence FASTA file: | ||
+ | < | ||
+ | > | ||
+ | MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG | ||
+ | LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK | ||
+ | IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL | ||
+ | MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL | ||
+ | > | ||
+ | SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI | ||
+ | ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH | ||
</ | </ | ||
=== Header line === | === Header line === | ||
* begins with ">" | * begins with ">" | ||
- | * gives a name and/ | + | * word following is the identifier |
- | * and other informations | + | * rest of the line is the description (optional) |
- | * Many different sequence databases use standardized headers, which helps when automatically extracting information from the header. | + | * no space between |
- | * header line may contain more than one header separated by a ^A (Control-A) character. | + | * header line may contain more than one header separated by a ^A (Control-A) character |
+ | \\ | ||
+ | * Sequence identifiers: | ||
+ | * Many different sequence databases use standardized headers, which helps when automatically extracting information from the header | ||
+ | * NCBI defined a standard for the unique identifier | ||
+ | * they do not give a definitive description of the FASTA defline format, an attempt to create such a format: | ||
+ | |||
+ | | GenBank | ||
+ | | EMBL Data Library | ||
+ | | DDBJ, DNA Database of Japan | '' | ||
+ | | NBRF PIR | '' | ||
+ | | Protein Research Foundation | ||
+ | | SWISS-PROT | ||
+ | | Brookhaven Protein Data Bank (1) | '' | ||
+ | | Brookhaven Protein Data Bank (2) | '' | ||
+ | | Patents | ||
+ | | GenInfo Backbone Id | '' | ||
+ | | General database identifier | ||
+ | | NCBI Reference Sequence | ||
+ | | Local Sequence identifier | ||
+ | //Anm: Die gi-Nummer ist eine Abfolge von Ziffern, die einen Datenbankeintrag des NCBI markiert.// | ||
+ | |||
+ | \\ | ||
=== Sequence representation === | === Sequence representation === | ||
* After the header line and comments | * After the header line and comments | ||
Line 70: | Line 111: | ||
| | | | ||
+ | \\ | ||
The amino acid codes supported are: | The amino acid codes supported are: | ||
Line 102: | Line 144: | ||
- | ===== How to cite ===== | + | |
+ | |||
+ | |||
+ | ===== converter | ||
+ | [[http:// | ||
+ | [[http:// | ||
+ | [[http:// |
fasta.1197902103.txt.gz · Last modified: 2008/07/22 13:30 (external edit)