User Tools

Site Tools


hapmap

HapMap format

  • The header of Hapmap format files looks like: rs#SNPalleleschromposstrand genome_build centerprotLSID assayLSIDpanelLSIDQC_code followed by a list of sample identifiers.
  • a sample line of the hapmap format genotype file:
    rs169757 A/C Chr21 9928594 + ncbi_b35.1 broad urn:LSID:affymetrix.hapmap.org:Protocol:genotype_protocol_1:1 urn:LSID:affymetrix.hapmap.org:Assay:1612756:1 urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1 QC+ AC AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AC AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AC AA AC AA AA AA AA AA AA AA AA AC AA AA AC AC AA AA AA AA AA AA AA AA
  • The table describes the valid values for each column:
Column header Description
rs# A string of characters starting with letters 'rs' then followed by digits, e.g. rs12345
SNPalleles All possible alleles (A, G, C or T) for the SNP with each separated by a forward slash, e.g. A/G
Chromo Three-letter String 'Chr' followed by a number from 1 to 22 or a letter X or Y for sex chromosome, e.g. Chr22
Pos position of the SNP, an integer number
Strand One single character, either '+' or '-'. '+' refers to a strand going from 5-prime telomere to 3-prime telomere, and'-' refers to a strand going from 3-prime telomere to 5-prime telomere.
genome_build A string of characters, e.g.ncbi_b35.1
Center A string of characters,e.g. broad
protLSID A string of characters, e.g. urn:LSID:affymetrix.hapmap.org:Protocol:genotype_protocol_1:1
assayLSID A string of characters, e.g. urn:lsid:affymetrix.hapmap.org:Assay:1612756:1
panelLSID A string of characters, e.g. urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1
QC_code Either 'QC+' or 'QC-'
genotypes A pair of letters, with each letter chosen from the set of (A,G,C,T,N), e.g. AG


  • Note that all columns are case-sensitive if not mentioned otherwise, no space is allowed within each column
hapmap.txt · Last modified: 2008/07/22 13:31 by 127.0.0.1