fastq
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
fastq [2011/09/05 15:05] – heidi | fastq [2011/09/19 16:19] (current) – heidi | ||
---|---|---|---|
Line 6: | Line 6: | ||
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are encoded with a single ASCII character for brevity. It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA sequence and its quality data, but has recently become the de facto standard for storing the output of high throughput sequencing instruments such as the Illumina Genome Analyzer. | FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are encoded with a single ASCII character for brevity. It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA sequence and its quality data, but has recently become the de facto standard for storing the output of high throughput sequencing instruments such as the Illumina Genome Analyzer. | ||
- | ===== format | + | |
+ | ===== Format | ||
* text based | * text based | ||
- | * no standard file extension | + | * no standard file extension, but .fq, .fastq, and .txt are commonly used. |
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== File format ===== | ||
+ | * A FASTQ file normally uses four lines per sequence: | ||
+ | * Line 1: begins with a ' | ||
+ | * Line 2: is the raw sequence letters (IUPAC ambiguity codes: ACTGNURYSWKMBDHV) | ||
+ | * Line 3: begins with a ' | ||
+ | * Line 4: encodes the [[http:// | ||
+ | * The original Sanger FASTQ files also allowed the sequence and quality strings to be wrapped (split over multiple lines), but this is generally discouraged as it can make parsing complicated due to the unfortunate choice of " | ||
+ | |||
+ | \\ | ||
+ | |||
+ | |||
- | ===== Data type handled ===== | + | ==== Example: |
- | * nucleic acid sequences | + | < |
+ | @SEQ_ID | ||
+ | GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT | ||
+ | + | ||
+ | !'' | ||
+ | </ | ||
- | ===== file format ===== | + | * FASTQ files from the NCBI/EBI Sequence Read Archive often include a description: |
+ | < | ||
+ | @SRR001666.1 071112_SLXA-EAS1_s_7: | ||
+ | GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC | ||
+ | +SRR001666.1 071112_SLXA-EAS1_s_7: | ||
+ | IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC | ||
+ | </ | ||
\\ | \\ | ||
fastq.1315227930.txt.gz · Last modified: 2011/09/05 15:05 by heidi