pgd
                Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| pgd [2011/07/07 16:08] – heidi | pgd [2016/02/22 15:41] (current) – heidi | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== PGD - Population Genetics Data format ====== | ====== PGD - Population Genetics Data format ====== | ||
| - | Version 1.0\\ | + | Version 1.1\\ | 
| \\ | \\ | ||
| - | PGD (Population Genetics Data) is a file format designed to contain population genetics data. The aim of this format is to facilitate the transfer among several population genetics software packages. PGD plays an important role in the new data format converter PGDSpider. | + | PGD (Population Genetics Data) is a file format designed to contain population genetics data. The aim of this format is to facilitate the transfer among several population genetics software packages. PGD plays an important role in the new data format converter PGDSpider.\\ | 
| - | PGD is written in XML and is therefore independent of any particular computer system and extensible for future needs. The XML structure can easily be processed by computer programs. An additional XSLT style sheet makes it possible to display the data in an understandable and comprehensive way. This XSLT style sheet is delivered within the PGDSpider download. | + | |
| \\ | \\ | ||
| + | PGD is written in XML and is therefore independent of any particular computer system and extensible for future needs. The XML structure can easily be processed by computer programs. An additional XSLT style sheet makes it possible to display the data in an understandable and comprehensive way. This XSLT style sheet is delivered within the PGDSpider download.\\ | ||
| + | \\ | ||
| + | The PGDSpider distribution also includes an XML Schema (PGD_schema.xsd), | ||
| + | \\ | ||
| + | \\ | ||
| + | |||
| Line 11: | Line 15: | ||
| PGD is able to handle following data types: | PGD is able to handle following data types: | ||
| * DNA | * DNA | ||
| - | * UHTS (Ultra High-Throughput | + | * NGS (Next-Generation | 
| * Microsat (coded as number of repeats!) | * Microsat (coded as number of repeats!) | ||
| * RFLP | * RFLP | ||
| Line 46: | Line 50: | ||
| \\ | \\ | ||
| + | |||
| ==== Header block: ==== | ==== Header block: ==== | ||
| Line 62: | Line 67: | ||
| * Value: Character | * Value: Character | ||
| * Character which codes missing values | * Character which codes missing values | ||
| - | * <gap> (mandatory): | + | * <gap> (optional): | 
| * Value: Character | * Value: Character | ||
| * Character which codes gaps | * Character which codes gaps | ||
| Line 106: | Line 111: | ||
| * Value: Integer | * Value: Integer | ||
| * Gives the length of the locus in number of bases | * Gives the length of the locus in number of bases | ||
| + | * < | ||
| + | * Value: String | ||
| + | * Gives the ancestral state of the locus | ||
| * < | * < | ||
| * Value: String | * Value: String | ||
| Line 137: | Line 145: | ||
| * Defines the names of the loci in the data for this population, separated by comma | * Defines the names of the loci in the data for this population, separated by comma | ||
| * The loci have to be of the same type | * The loci have to be of the same type | ||
| - | * If the data are nucleotide sequence data, just include one locus in this tag (if more than one locus exist one has to repeat the whole <ind> tag for every locus and specify the locus name in the < | ||
| - | <ind name=" | ||
| - | < | ||
| - | < | ||
| - | < | ||
| - | </ | ||
| - |  | ||
| - | <ind name=" | ||
| - | < | ||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | </ | ||
| - | |||
| * <ind> with attribute “name=” (mandatory): | * <ind> with attribute “name=” (mandatory): | ||
| * Defines the different individuals in this population | * Defines the different individuals in this population | ||
| Line 167: | Line 161: | ||
| * Only if the data are of different data types in this population | * Only if the data are of different data types in this population | ||
| * Defines the loci names of the data with the same data type in this individual separated by “,” | * Defines the loci names of the data with the same data type in this individual separated by “,” | ||
| - | * The loci must be of the same data type and all loci of the same data type have to be included (only one loci tag allowed per data type, except for nucleotide sequence data) | + | * The loci must be of the same data type | 
| - | * If the data are nucleotide sequence data, just include one loci in this tag (if more than one loci exist you have to repeat the whole ind tag for every loci) | + | |
| * < | * < | ||
| * Value: Integer | * Value: Integer | ||
| Line 240: | Line 233: | ||
| * only one per file | * only one per file | ||
| \\ | \\ | ||
| - | * Microsat data are number of repeats | + | * Microsat data are number of repeats | 
| - | * Only one loci tag is allowed per data type, i.e. all loci of the same data type have to be in the same tag | + | |
| - | * Nucleotide sequence data: just one locus in one tag (popLoci/ indLoci tag). If more than one loci exist repeat the whole ind tag for every loci | + | |
| * distanceMatrix: | * distanceMatrix: | ||
| \\ | \\ | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| Line 261: | Line 257: | ||
| | | ploidy * //(--> mixed/ | | | ploidy * //(--> mixed/ | ||
| | | missing ||| x | x | | | | | missing ||| x | x | | | ||
| - | | | gap ||| | + | | | gap ||| | 
| | | gameticPhase //(--> known/ | | | gameticPhase //(--> known/ | ||
| | | recessiveData //(--> yes/ | | | recessiveData //(--> yes/ | ||
| Line 272: | Line 268: | ||
| | || locusGenic //(--> coding/ | | || locusGenic //(--> coding/ | ||
| | || locusLength || o | o | | | | || locusLength || o | o | | | ||
| - | | || locusLinks || o | o | | | + | | || locusAncestralState || o | o | | | 
| + | | || locusLinks | ||
| | || locusComments || o | o | | | | || locusComments || o | o | | | ||
| ^^^^^^^^^^^ | ^^^^^^^^^^^ | ||
| Line 278: | Line 275: | ||
| | | popGeogCoord * //(lon, lat)// | | | popGeogCoord * //(lon, lat)// | ||
| | | popLingGroup * ||| o (a3) | o (a3) | o | | | | popLingGroup * ||| o (a3) | o (a3) | o | | ||
| - | | | popPloidy * //(--> mixed/ | + | | | popPloidy * //(--> mixed/ | 
| | | popLoci *< | | | popLoci *< | ||
| | | ind (attribute: name) | indGeogCoord //(lon, lat)// | | | ind (attribute: name) | indGeogCoord //(lon, lat)// | ||
| | || indLingGroup || o (a3) | o (a3) | | | | || indLingGroup || o (a3) | o (a3) | | | ||
| - | | || indLoci *< | + | | || indLoci *< | 
| | || indPloidy // | | || indPloidy // | ||
| | || indFreq //(absolute Freq)// || o | o | x | | | || indFreq //(absolute Freq)// || o | o | x | | ||
| | || data *< | | || data *< | ||
| - | | || read *< | + | | || read *< | 
| | ||| length *< | | ||| length *< | ||
| | ||| data *< | | ||| data *< | ||
| Line 316: | Line 313: | ||
| \\ | \\ | ||
| + | |||
| + | |||
| + | |||
| + | |||
| Line 322: | Line 323: | ||
| ===== PGD file examples: ===== | ===== PGD file examples: ===== | ||
| * Data of two loci with Standard data type from four diploid populations: | * Data of two loci with Standard data type from four diploid populations: | ||
| - | * alinged data with different data types: [[PGD_aligned_DiffDataTypes]] | + | * Data of two loci with different data types (Standard and DNA) from two diploid populations: [[PGD_diffDataTypes]] | 
| - | * UHTS data with mixed number | + | * NGS data of two loci from three haploid populations: [[PGD_NGS]] | 
| \\ | \\ | ||
pgd.1310047724.txt.gz · Last modified: 2011/07/07 16:08 by heidi
                
                