User Tools

Site Tools


ped

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
ped [2011/06/07 16:39] – created heidiped [2011/06/08 09:42] (current) heidi
Line 1: Line 1:
 ====== PED ====== ====== PED ======
 \\ \\
-**[[http://ib.berkeley.edu/labs/slatkin/eriq/software/software.htm|NewHybrids]]**\\ +**[[http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped|PED]]**\\
-[[http://ib.berkeley.edu/labs/slatkin/eriq/software/new_hybs_doc1_1Beta3.pdf|manual]]+
  
  
 \\ \\
-PED version 1.1 beta (7. April 2003)\\ +PED\\ 
-NewHybrids is a program for computing the posterior distribution that individuals in a sample fall into different hybrid categories.+The "ped" file format refers to the widely-used format for linkage pedigree data and used as input for the program PLINK. PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
  
 \\ \\
 +
  
  
  
 ===== Program information ===== ===== Program information =====
 +  * written in C/C++
   * Mac   * Mac
   * Windows   * Windows
 +  * Unix
 \\ \\
 +
  
  
Line 24: Line 27:
 ===== Data type handled ===== ===== Data type handled =====
   * diploid   * diploid
-  * AFLP +  * SNP
-  * MICROSAT +
-  * Standard+
  
 \\ \\
 +
 +
 +
  
  
Line 38: Line 42:
  
 ===== Input Files ===== ===== Input Files =====
-  * whitespace (spaces and or tabs) separated text file *.txt/*.dat +  * whitespace (spaces and or tabs) separated text file *.ped 
-  * first line: ''NumIndivs '' number of individuals +  each line correspond to one individual 
-  second line: ''NumLoci '' number of loci +  * following first 6 columns are mandatory (The IDs are alphanumberic): 
-  third line: ''Digits '' number of digits used to denote a particular allele +    * ''Family ID'' 
-  fourth line: ''Format '' ''Lumped'' (genotype at a single locus is given by a single numberor ''NonLumped'' +    * ''Individual ID'' 
-  * next lines: ''LocusNames'' names of all loci separated by whitespace  +    * ''Paternal ID'' 
-  * next linesgenotype data +    * ''Maternal ID'' 
-    * first character: number of the individual (numbering must be serially) +    * ''Sex'' (1=male; 2=female; any other character=unknown) 
-    * next charactersgenotypes (all on same line or on different lines+    * ''Phenotype'' (only 1 phenotype! The phenotype can be either a quantitative trait or an affection status column: PLINK will automatically detect which type (i.e. based on whether a value other than 0, 1, 2 or the missing genotype code is observed)) 
-      ''Lumped'' formattwo alleles are encoded as one number, ''Digits'' specify how many digits are used to represent each locus +  * Commentsline starts with ''#'' 
-      ''NonLumped'' format: alleles at each locus are given by a consecutive pair of numbers that are white space seperated +  * Affection status, by default, should be coded:  
-      * Missing data: ''Lumped'': encoded as ''0'', ''NonLumped'': encoded as ''-1'' (each allele at the missing locus must have a ''-1'')+    * -9 missing  
 +    * 0 missing 
 +    * 1 unaffected 
 +    * 2 affected 
 +  * column 7 onwardsGenotypes 
 +    * any character (e.g.: 1,2,3,or A,C,G,T or anything else
 +    missing genotype: ''0'' 
 +    all markers must be biallelic (diploid). Either both alleles should be missing or neither. Haploid data: encode them as diploid homozygot. Two alleles are shown after each other.
  
 \\ \\
 +If specially specified following columns can be missing:
 +  * ''Family ID''
 +  * ''Individual ID''
 +  * ''Paternal ID'' and ''Maternal ID''
 +  * ''Sex''
 +  * ''Phenotype''
 +
 +\\
 +
  
-==== AFLP data ==== 
-  * ''Lumped'' format 
-  * ''+'' band is present 
-  * ''-'' band is absent 
-  * ''0'' missing data 
  
-  data types can be mixed+==== MAP files ==== 
 +  Each line of the MAP file describes a single marker and must contain exactly 4 columns:  
 +    * chromosome (1-22, X, Y, MT or 0 if unplaced) 
 +    * rs# or snp identifier 
 +    * Genetic distance (morgans) (missing: 0) 
 +    * Base-pair position (bp units) (Base-pair positions are expected to correspond to positive integers within the range of typical human chromosome sizes) 
 +  * The MAP file must contain as many markers as are in the PED file. 
 +  * The markers in the PED file do not need to be in genomic order: (i.e. the order MAP file should align with the order of the PED file markers). 
  
 \\ \\
 +
 +
 +
  
  
 ==== Example ==== ==== Example ====
-  * ''Lumped'' data file:+  * PED files:
 <code> <code>
-NumIndivs 2 +FAM001   0  1  2  A A  G G  A C  
-NumLoci 6 +FAM001   0 0  1  2  A A  A G  0 0 
-Digits 1 +
-Format Lumped +
-LocusNames sAAT1 sAAT2 sAAT3 ADA1 ADA2 ADH +
-1 11 11 11 11 32 +
-21 11 21 11 11 12 +
 </code> </code>
-  * ''NonLumped'' data file: 
 <code> <code>
-NumIndivs 2 +1 0 0 1     A A    A A    A A    A A    A A 
-NumLoci 6 +0 0     A C    A C    A C    A C    A C 
-Digits +3 1 0 0   1   A A    A A    A A    A A    A A 
-Format NonLumped +4 1 0 0 2     A C    A C    A C    A C    A C
-LocusNames sAAT1 sAAT2 sAAT3 ADA1 ADA2 ADH +
-123 143 --144 144 120 122 157 158 144 144  +
-135 135 134 140 144 144 120 122 161 161 144 144 +
 </code> </code>
-  * AFLP data file (4 Microsat loci, 5 AFLP loci):+  * MAP files:
 <code> <code>
-NumIndivs 2 +1  rs123456  0  1234555 
-NumLoci 9 + rs234567  0  1237793 
-Digits 1 + rs224534  0  -1237697  
-Format Lumped +1  rs233556   1337456
-LocusNames m1 m2 m3 m4 A1 A2 A3 A4 A5 +
-11 12 13 11 + + + + +
-2 22 33 11 22 - - - - +
-3 12 13 13 11 + - - - ++
 </code> </code>
 +
 +<code>
 +1    snp1     1000
 +X    snp2     1000
 +Y    snp3     1000
 +XY   snp4     1000
 +MT   snp5     1000
 +</code>
 +  
 +\\
  
 ===== How to cite ===== ===== How to cite =====
-AndersonE.C. and ThompsonE.A. (2002A model-based method for identifying species hybrids using multilocus genetic data. Genetics 160: 1217-1229.+Purcell SNeale BTodd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007PLINK: a toolset for whole-genome association and population-based linkage analysisAmerican Journal of Human Genetics, 81.
  
  
ped.1307457560.txt.gz · Last modified: 2011/06/07 16:39 by heidi