User Tools

Site Tools


pgd

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
pgd [2011/07/07 15:52] heidipgd [2011/10/12 13:19] heidi
Line 2: Line 2:
 Version 1.0\\ Version 1.0\\
 \\ \\
-PGD (Population Genetics Data) is a file format designed to contain population genetics data. The aim of this format is to facilitate the transfer among several population genetics software packages. PGD plays an important role in the new data format converter PGDSpider. +PGD (Population Genetics Data) is a file format designed to contain population genetics data. The aim of this format is to facilitate the transfer among several population genetics software packages. PGD plays an important role in the new data format converter PGDSpider.\\
-PGD is written in XML and is therefore independent of any particular computer system and extensible for future needs. The XML structure can easily be processed by computer programs. An additional XSLT style sheet makes it possible to display the data in an understandable and comprehensive way. This XSLT style sheet is delivered within the PGDSpider download. +
 \\ \\
 +PGD is written in XML and is therefore independent of any particular computer system and extensible for future needs. The XML structure can easily be processed by computer programs. An additional XSLT style sheet makes it possible to display the data in an understandable and comprehensive way. This XSLT style sheet is delivered within the PGDSpider download.\\
 +\\
 +The PGDSpider distribution also includes an XML Schema (PGD_schema.xsd), which defines the structure of the PGD file. The purpose of an XML Schema is to define the legal building blocks of an XML document and the allowable contents (W3Schools, 2008). The provided XML Schema can be used to validate a PGD file.
 +\\
 +\\
 +
  
  
Line 11: Line 15:
 PGD is able to handle following data types: PGD is able to handle following data types:
   * DNA   * DNA
-  * UHTS (Ultra High-Throughput Sequencing data)+  * NGS (Next-Generation Sequencing data)
   * Microsat (coded as number of repeats!)   * Microsat (coded as number of repeats!)
   * RFLP   * RFLP
Line 46: Line 50:
  
 \\ \\
 +
  
 ==== Header block: ==== ==== Header block: ====
Line 62: Line 67:
     * Value: Character     * Value: Character
     * Character which codes missing values     * Character which codes missing values
-  * <gap> (mandatory):+  * <gap> (optional):
     * Value: Character     * Value: Character
     * Character which codes gaps     * Character which codes gaps
Line 245: Line 250:
   * distanceMatrix: lower triangle with diagonal    * distanceMatrix: lower triangle with diagonal
 \\ \\
 +
 +
 +
 +
 +
 +
  
  
Line 256: Line 267:
 | **header** (attribute: title) | organism |||  o  |  o  |  o  | | **header** (attribute: title) | organism |||  o  |  o  |  o  |
 | | numPop |||  x  |  x  |  x  | | | numPop |||  x  |  x  |  x  |
-| | ploidy * //(--> mixed/1/2/...)//|||  x (a5)  |  x (a5)  | |+| | ploidy * //(--> mixed/1/2/...)//|||  x (a4)  |  x (a4)  | |
 | | missing |||  x  |  x  | | | | missing |||  x  |  x  | |
-| | gap |||   |   | |+| | gap |||   |   | |
 | | gameticPhase //(--> known/unknown)//|||  o  |  o  | | | | gameticPhase //(--> known/unknown)//|||  o  |  o  | |
 | | recessiveData //(--> yes/no)//|||  o  |  o  | | | | recessiveData //(--> yes/no)//|||  o  |  o  | |
 ^^^^^^^^^^^ ^^^^^^^^^^^
 | **dataDescription** (o) | numLoci |||  x  |  x  | | | **dataDescription** (o) | numLoci |||  x  |  x  | |
-| | dataType * //(--> mixed/DNA/NGS/Microsat/RFLP/AFLP/Standard/Frequency/...)//|||  x (a2)  |  x (a2)  |  x  | +| | dataType * //(--> mixed/DNA/NGS/Microsat/RFLP/AFLP/Standard/Frequency/...)//|||  x (a1)  |  x (a1)  |  x  | 
-| | locus (attribute: id) | locusDataType //(--> DNA/NGS/Microsat/RFLP/AFLP/Standard/Frequency/...)//||  a2  |  a2  | |+| | locus (attribute: id) | locusDataType //(--> DNA/NGS/Microsat/RFLP/AFLP/Standard/Frequency/...)//||  a1  |  a1  | |
 | || locusChromosome //(--> number/X/Y/W/Z/mtDNA/...)//||  o  |  o  | | | || locusChromosome //(--> number/X/Y/W/Z/mtDNA/...)//||  o  |  o  | |
 | || locusLocation ||  o  |  o  | | | || locusLocation ||  o  |  o  | |
 | || locusGenic //(--> coding/noncoding)// ||  o  |  o  | | | || locusGenic //(--> coding/noncoding)// ||  o  |  o  | |
 | || locusLength ||  o  |  o  | | | || locusLength ||  o  |  o  | |
-| || locusLinks ||  o  |  o  | |+| || locusLinks //(--> URL)// ||  o  |  o  | |
 | || locusComments ||  o  |  o  | | | || locusComments ||  o  |  o  | |
 ^^^^^^^^^^^ ^^^^^^^^^^^
 | **population** (attribute: name) | popSize |||  x  |  x  |  x  | | **population** (attribute: name) | popSize |||  x  |  x  |  x  |
-| | popGeogCoord * //(lon, lat)//|||  o (a3)  |  o (a3)  |  o  | +| | popGeogCoord * //(lon, lat)//|||  o (a2)  |  o (a2)  |  o  | 
-| | popLingGroup * |||  o (a4)  |  o (a4)  |  o  | +| | popLingGroup * |||  o (a3)  |  o (a3)  |  o  | 
-| | popPloidy * //(--> mixed/1/2/...)//|||  x (a5)  |  x (a5)  | |+| | popPloidy * //(--> mixed/1/2/...)//|||  a4  |  a4  | |
 | | popLoci *<sup>3</sup> //(locus name, locus name,...) --> all locus of same data type// |||  o  |  o  | | | | popLoci *<sup>3</sup> //(locus name, locus name,...) --> all locus of same data type// |||  o  |  o  | |
-| | ind (attribute: name) | indGeogCoord //(lon, lat)//||  o (a3)  |  o (a3)  | | +| | ind (attribute: name) | indGeogCoord //(lon, lat)//||  o (a2)  |  o (a2)  | | 
-| || indLingGroup ||  o (a4)  |  o (a4)  | | +| || indLingGroup ||  o (a3)  |  o (a3)  | | 
-| || indLoci *<sup>4</sup> //(locus name, locus name, ...) --> all locus of same data type// ||   |   | | +| || indLoci *<sup>4</sup> //(locus name, locus name, ...) --> all locus of same data type// ||   |   | | 
-| || indPloidy //(-->1/2/...)//||  a5  |  a5  | |+| || indPloidy //(-->1/2/...)//||  a4  |  a4  | |
 | || indFreq //(absolute Freq)// ||  o  |  o  |  x  | | || indFreq //(absolute Freq)// ||  o  |  o  |  x  |
-| || data *<sup>7</sup> //(locus data, locus data, ...)// ||  x  |  x  | | +| || data *<sup>5</sup> //(locus data, locus data, ...)// ||  x  |  x  | | 
-| || read *<sup>8</sup> | start *<sup>8</sup> | |  x   | | +| || read *<sup>6</sup> (attribute: id) | start *<sup>6</sup> | |  x   | | 
-| ||| length *<sup>8</sup> | |  o  | | +| ||| length *<sup>6</sup> | |  o  | | 
-| ||| data *<sup>8</sup> | |  x  | | +| ||| data *<sup>6</sup> | |  x  | | 
-| ||| quality *<sup>8</sup> | |  o  | |+| ||| quality *<sup>6</sup> | |  o  | |
 ^^^^^^^^^^^ ^^^^^^^^^^^
 | **structure** (attribute: name) (o) | numGroups |||  x  |  x  |  x  | | **structure** (attribute: name) (o) | numGroups |||  x  |  x  |  x  |
Line 294: Line 305:
 | | matrixLabels //(name, name,...)// |||  x  |  x  |  x  | | | matrixLabels //(name, name,...)// |||  x  |  x  |  x  |
 | | matrix //(number (line break)  number, number (line break)...)// |||  x  |  x  |  x  | | | matrix //(number (line break)  number, number (line break)...)// |||  x  |  x  |  x  |
 +
  
  
Line 304: Line 316:
 *<sup>3</sup> data of the same data type (loci) in all individuals\\ *<sup>3</sup> data of the same data type (loci) in all individuals\\
 *<sup>4</sup> data of different data types (aligned within each locus)\\ *<sup>4</sup> data of different data types (aligned within each locus)\\
-*<sup>7</sup> non-NGS data\\ +*<sup>5</sup> non-NGS data\\ 
-*<sup>8</sup> NGS data (Next Generation Sequencing)\\+*<sup>6</sup> NGS data (Next Generation Sequencing)\\
  
 x: obligatory\\ x: obligatory\\
Line 312: Line 324:
  
 \\ \\
 +
 +
 +
 +
 +
 +
  
  
 ===== PGD file examples: ===== ===== PGD file examples: =====
-  * aligned data: [[PGD_aligned]] +  * Data of two loci with Standard data type from four diploid populations: [[PGD_standard]] 
-  * alinged data with different data types: [[PGD_aligned_DiffDataTypes]] +  * Data of two loci with different data types (Standard and DNA) from two diploid populations: [[PGD_diffDataTypes]] 
-  * UHTS data with mixed number of repeats: [[PGD_unaligned_mixedReads]]+  * NGS data of two loci from three haploid populations: [[PGD_NGS]]
  
 \\ \\
pgd.txt · Last modified: 2016/02/22 15:41 by heidi