fasta
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| fasta [2007/12/14 15:22] – created heidi | fasta [2008/07/22 13:31] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== FASTA ====== | ====== FASTA ====== | ||
| - | ===== Program | + | **[[http:// |
| + | [[http:// | ||
| + | |||
| + | \\ | ||
| + | FASTA format is a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. | ||
| + | |||
| + | |||
| + | ===== format | ||
| + | * text based | ||
| + | * no standard file extension for a text file containing FASTA formatted sequences. FASTA format files often have file extensions like .fa, .mpfa, .fna, .fsa, .fas or .fasta | ||
| ===== Data type handled ===== | ===== Data type handled ===== | ||
| - | ===== Input Files ===== | + | * nucleic acid sequences |
| - | ===== How to cite ===== | + | * peptide sequences |
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ===== file format | ||
| + | * begins with a single-line description, | ||
| + | * It is recommended that all lines of text be shorter than 80 characters | ||
| + | * The sequence ends if another line starting with a ">" | ||
| + | **simple examples: | ||
| + | < | ||
| + | > | ||
| + | LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV | ||
| + | EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG | ||
| + | LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL | ||
| + | GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX | ||
| + | IENY | ||
| + | </ | ||
| + | |||
| + | example of a multiple sequence FASTA file: | ||
| + | < | ||
| + | > | ||
| + | MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG | ||
| + | LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK | ||
| + | IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL | ||
| + | MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL | ||
| + | > | ||
| + | SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI | ||
| + | ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH | ||
| + | </ | ||
| + | |||
| + | === Header line === | ||
| + | * begins with ">" | ||
| + | * word following is the identifier and/or name of the sequence (optional) | ||
| + | * rest of the line is the description (optional) | ||
| + | * no space between the ">" | ||
| + | * header line may contain more than one header separated by a ^A (Control-A) character | ||
| + | |||
| + | \\ | ||
| + | * Sequence identifiers: | ||
| + | * Many different sequence databases use standardized headers, which helps when automatically extracting information from the header | ||
| + | * NCBI defined a standard for the unique identifier | ||
| + | * they do not give a definitive description of the FASTA defline format, an attempt | ||
| + | |||
| + | | GenBank | ||
| + | | EMBL Data Library | ||
| + | | DDBJ, DNA Database of Japan | '' | ||
| + | | NBRF PIR | '' | ||
| + | | Protein Research Foundation | ||
| + | | SWISS-PROT | ||
| + | | Brookhaven Protein Data Bank (1) | '' | ||
| + | | Brookhaven Protein Data Bank (2) | '' | ||
| + | | Patents | ||
| + | | GenInfo Backbone Id | '' | ||
| + | | General database identifier | ||
| + | | NCBI Reference Sequence | ||
| + | | Local Sequence identifier | ||
| + | //Anm: Die gi-Nummer ist eine Abfolge von Ziffern, die einen Datenbankeintrag des NCBI markiert.// | ||
| + | |||
| + | \\ | ||
| + | === Sequence representation | ||
| + | * After the header line and comments | ||
| + | * each line of a sequence should have fewer than 80 characters | ||
| + | * Sequences may be protein sequences or nucleic acid sequences | ||
| + | * can contain gaps or alignment characters | ||
| + | * Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: | ||
| + | * lower-case letters are accepted and are mapped into upper-case | ||
| + | * a single hyphen or dash can be used to represent a gap character | ||
| + | * in amino acid sequences: U and * are acceptable letters | ||
| + | * Numerical digits are not allowed but are used in some databases to indicate the position in the sequence | ||
| + | |||
| + | \\ | ||
| + | The nucleic acid codes supported are: | ||
| + | |||
| + | ^ Nucleic Acid Code ^ Meaning | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | |||
| + | \\ | ||
| + | The amino acid codes supported are: | ||
| + | |||
| + | ^ Amino Acid Code ^ Meaning ^ | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | | | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ===== converter ===== | ||
| + | [[http:// | ||
| + | [[http:// | ||
| + | [[http:// | ||
fasta.1197642149.txt.gz · Last modified: 2008/07/22 13:30 (external edit)