User Tools

Site Tools


ped

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ped [2011/06/07 16:47] heidiped [2011/06/08 09:42] (current) heidi
Line 9: Line 9:
  
 \\ \\
 +
  
  
  
 ===== Program information ===== ===== Program information =====
 +  * written in C/C++
   * Mac   * Mac
   * Windows   * Windows
 +  * Unix
 \\ \\
 +
  
  
Line 23: Line 27:
 ===== Data type handled ===== ===== Data type handled =====
   * diploid   * diploid
-  * AFLP +  * SNP
-  * MICROSAT +
-  * Standard+
  
 \\ \\
 +
 +
 +
  
  
Line 37: Line 42:
  
 ===== Input Files ===== ===== Input Files =====
-  * whitespace (spaces and or tabs) separated text file *.txt/*.dat +  * whitespace (spaces and or tabs) separated text file *.ped 
-  * first line: ''NumIndivs '' number of individuals +  each line correspond to one individual 
-  second line: ''NumLoci '' number of loci +  * following first 6 columns are mandatory (The IDs are alphanumberic): 
-  third line: ''Digits '' number of digits used to denote a particular allele +    * ''Family ID'' 
-  fourth line: ''Format '' ''Lumped'' (genotype at a single locus is given by a single numberor ''NonLumped'' +    * ''Individual ID'' 
-  * next lines: ''LocusNames'' names of all loci separated by whitespace  +    * ''Paternal ID'' 
-  * next linesgenotype data +    * ''Maternal ID'' 
-    * first character: number of the individual (numbering must be serially) +    * ''Sex'' (1=male; 2=female; any other character=unknown) 
-    * next charactersgenotypes (all on same line or on different lines+    * ''Phenotype'' (only 1 phenotype! The phenotype can be either a quantitative trait or an affection status column: PLINK will automatically detect which type (i.e. based on whether a value other than 0, 1, 2 or the missing genotype code is observed)) 
-      ''Lumped'' formattwo alleles are encoded as one number, ''Digits'' specify how many digits are used to represent each locus +  * Commentsline starts with ''#'' 
-      ''NonLumped'' format: alleles at each locus are given by a consecutive pair of numbers that are white space seperated +  * Affection status, by default, should be coded:  
-      * Missing data: ''Lumped'': encoded as ''0'', ''NonLumped'': encoded as ''-1'' (each allele at the missing locus must have a ''-1'')+    * -9 missing  
 +    * 0 missing 
 +    * 1 unaffected 
 +    * 2 affected 
 +  * column 7 onwardsGenotypes 
 +    * any character (e.g.: 1,2,3,or A,C,G,T or anything else
 +    missing genotype: ''0'' 
 +    all markers must be biallelic (diploid). Either both alleles should be missing or neither. Haploid data: encode them as diploid homozygot. Two alleles are shown after each other.
  
 \\ \\
 +If specially specified following columns can be missing:
 +  * ''Family ID''
 +  * ''Individual ID''
 +  * ''Paternal ID'' and ''Maternal ID''
 +  * ''Sex''
 +  * ''Phenotype''
 +
 +\\
 +
  
-==== AFLP data ==== 
-  * ''Lumped'' format 
-  * ''+'' band is present 
-  * ''-'' band is absent 
-  * ''0'' missing data 
  
-  data types can be mixed+==== MAP files ==== 
 +  Each line of the MAP file describes a single marker and must contain exactly 4 columns:  
 +    * chromosome (1-22, X, Y, MT or 0 if unplaced) 
 +    * rs# or snp identifier 
 +    * Genetic distance (morgans) (missing: 0) 
 +    * Base-pair position (bp units) (Base-pair positions are expected to correspond to positive integers within the range of typical human chromosome sizes) 
 +  * The MAP file must contain as many markers as are in the PED file. 
 +  * The markers in the PED file do not need to be in genomic order: (i.e. the order MAP file should align with the order of the PED file markers). 
  
 \\ \\
 +
 +
 +
  
  
 ==== Example ==== ==== Example ====
-  * ''Lumped'' data file:+  * PED files:
 <code> <code>
-NumIndivs 2 +FAM001   0  1  2  A A  G G  A C  
-NumLoci 6 +FAM001   0 0  1  2  A A  A G  0 0 
-Digits 1 +
-Format Lumped +
-LocusNames sAAT1 sAAT2 sAAT3 ADA1 ADA2 ADH +
-1 11 11 11 11 32 +
-21 11 21 11 11 12 +
 </code> </code>
-  * ''NonLumped'' data file: 
 <code> <code>
-NumIndivs 2 +1 0 0 1     A A    A A    A A    A A    A A 
-NumLoci 6 +0 0     A C    A C    A C    A C    A C 
-Digits +3 1 0 0   1   A A    A A    A A    A A    A A 
-Format NonLumped +4 1 0 0 2     A C    A C    A C    A C    A C
-LocusNames sAAT1 sAAT2 sAAT3 ADA1 ADA2 ADH +
-123 143 --144 144 120 122 157 158 144 144  +
-135 135 134 140 144 144 120 122 161 161 144 144 +
 </code> </code>
-  * AFLP data file (4 Microsat loci, 5 AFLP loci):+  * MAP files:
 <code> <code>
-NumIndivs 2 +1  rs123456  0  1234555 
-NumLoci 9 + rs234567  0  1237793 
-Digits 1 + rs224534  0  -1237697  
-Format Lumped +1  rs233556   1337456
-LocusNames m1 m2 m3 m4 A1 A2 A3 A4 A5 +
-11 12 13 11 + + + + +
-2 22 33 11 22 - - - - +
-3 12 13 13 11 + - - - ++
 </code> </code>
 +
 +<code>
 +1    snp1     1000
 +X    snp2     1000
 +Y    snp3     1000
 +XY   snp4     1000
 +MT   snp5     1000
 +</code>
 +  
 +\\
  
 ===== How to cite ===== ===== How to cite =====
-AndersonE.C. and ThompsonE.A. (2002A model-based method for identifying species hybrids using multilocus genetic data. Genetics 160: 1217-1229.+Purcell SNeale BTodd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007PLINK: a toolset for whole-genome association and population-based linkage analysisAmerican Journal of Human Genetics, 81.
  
  
ped.1307458061.txt.gz · Last modified: 2011/06/07 16:47 by heidi