hapmap
HapMap format
- The header of Hapmap format files looks like:
rs#SNPalleleschromposstrand genome_build centerprotLSID assayLSIDpanelLSIDQC_code
followed by a list of sample identifiers.
- a sample line of the hapmap format genotype file:
rs169757 A/C Chr21 9928594 + ncbi_b35.1 broad urn:LSID:affymetrix.hapmap.org:Protocol:genotype_protocol_1:1 urn:LSID:affymetrix.hapmap.org:Assay:1612756:1 urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1 QC+ AC AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AC AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AC AA AC AA AA AA AA AA AA AA AA AC AA AA AC AC AA AA AA AA AA AA AA AA
- The table describes the valid values for each column:
Column header | Description |
---|---|
rs# | A string of characters starting with letters 'rs' then followed by digits, e.g. rs12345 |
SNPalleles | All possible alleles (A, G, C or T) for the SNP with each separated by a forward slash, e.g. A/G |
Chromo | Three-letter String 'Chr' followed by a number from 1 to 22 or a letter X or Y for sex chromosome, e.g. Chr22 |
Pos | position of the SNP, an integer number |
Strand | One single character, either '+' or '-'. '+' refers to a strand going from 5-prime telomere to 3-prime telomere, and'-' refers to a strand going from 3-prime telomere to 5-prime telomere. |
genome_build | A string of characters, e.g.ncbi_b35.1 |
Center | A string of characters,e.g. broad |
protLSID | A string of characters, e.g. urn:LSID:affymetrix.hapmap.org:Protocol:genotype_protocol_1:1 |
assayLSID | A string of characters, e.g. urn:lsid:affymetrix.hapmap.org:Assay:1612756:1 |
panelLSID | A string of characters, e.g. urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1 |
QC_code | Either 'QC+' or 'QC-' |
genotypes | A pair of letters, with each letter chosen from the set of (A,G,C,T,N), e.g. AG |
- Note that all columns are case-sensitive if not mentioned otherwise, no space is allowed within each column
hapmap.txt · Last modified: 2008/07/22 13:31 by 127.0.0.1