User Tools

Site Tools


pgdspider

PGDSpider


PGDSpider version 2.0.6.0 (Juli 2014)

PGDSpider is a powerful automated data conversion tool for population genetic and genomics programs. It facilitates the data exchange possibilities between programs for a vast range of data types (e.g. DNA, RNA, NGS, microsatellite, SNP, RFLP, AFLP, multi-allelic data, allele frequency or genetic distances). Besides the conventional population genetics formats, PGDSpider integrates population genomics data formats commonly used to store and handle next-generation sequencing (NGS) data. Currently, PGDSpider is not meant to convert very large NGS files as it loads into memory the whole input file, whose size may exceed available RAM. However, since PGDSpider allows one to convert specific subsets of these NGS files into any other format, one could use this feature to calculate parameters or statistics for specific regions, and thus perform sliding window analysis over large genomic regions.

PGDSpider uses a newly developed PGD (Population Genetics Data) format as an intermediate step in the conversion process. PGD is a file format designed to store various kinds of population genetics data, including different data types (e.g. DNA sequences, microsatellites, AFLP or SNPs) and ploidy levels. PGD is based on the XML format and is therefore independent of any particular computer system and extensible for future needs. PGDSpider uses PGD to connect population genetics and genomics programs like a spider knits a web.

PGDSpider is written in Java and is therefore platform independent. It is user friendly due to its intuitive graphical user interface. PGDSpider allows the user to store his preferred conversion settings for repeated conversions of similar input formats. A command line version of PGDSpider is also provided, making it possible to embed PGDSpider in data analysis pipelines.


download PGDSpider

PGDSpider

PGDSpider - manual


System requirements:
PGDSpider is written in Java and therefore platform independent, but SUN Java 1.6 RE (or a newer version) has to be installed.

How to cite PGDSpider and License

Lischer HEL and Excoffier L (2012) PGDSpider: An automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics 28: 298-299.


Copyright © 2007-2014, Heidi E.L. Lischer. All rights reserved.
PGDSpider is distributed under the BSD 3-Clause License. For the full text of the license, see the file LICENSE.txt
By using, modifying or distributing this software you agree to be bound by the terms of this license.


Input (read) formats

  • PGD: Version 1.1
  • Arlequin: Version 3.5 (24. February 2010)
  • BAM: (17. April 2011)
  • BAPS: Version 5.4 (29. April 2010)
  • BATWING: (2003)
  • BCF: (14. May 2011)
  • CONVERT: Version 1.31 (March 2005)
  • Eigensoft (EIGENSTRAT, ANCESTRYMAP): Version 5.0.2 (April 2014)
  • FASTA: but only these types:
GenBank gi|gi-number|gb|accession|locus
EMBL Data Library gi|gi-number|emb|accession|locus
DDBJ, DNA Database of Japan gi|gi-number|dbj|accession|locus
General database identifier gnl|database|identifier
“simply” identifier


  • FSTAT: Version 2.9.3.2 (February 2002)
  • GDA: Version 1.1 (7. January 2002)
  • Geneland: (12. April 2011)
  • GENEPOP: Version 4.1 (24. March 2011)
  • GENETIX: Version 4.05 (5. May 2004)
  • HGDP: Stanford
  • HGDP-CEPH (Arlequin + log file): Version 3.0
  • Immanc (BayesAss): Version 5.0 (8. October 1998)
  • IM (IMa): updated 17. December 2009
  • IMa2: updated 26. September 2011
  • MEGA: Version 5 (24. April 2011)
  • MIGRATE: Version 3.2.6 (13. October 2010)
  • MSA: Version 4.05
  • NewHybrids: Version 1.1 beta (7. April 2003)
  • ONeSAMP: Version 1.2
  • PHYLIP (DNA/ RNA /distance matrix): Version 3.69 (September 2009)
  • SAM: Version 1.4 (17. April 2011)
  • STRUCTURE: Version 2.3.3 (January 2010)
  • VCF: 4.1 (2. August 2012) without structural variants (only SNP and INDELs)

Outfile (write) formats

  • PGD: Version 1.0
  • Arlequin: Version 3.5 (24. February 2010)
  • BAM: (17. April 2011)
  • BAMOVA: Version 1.02 (27.9.2911)
  • BAPS: Version 5.4 (29. April 2010)
  • BATWING: (2003)
  • BCF: (14. May 2011)
  • Eigensoft (EIGENSTRAT, ANCESTRYMAP): Version 5.0.2 (April 2014)
  • FDist2 (datacal)
  • FSTAT: Version 2.9.3.2 (February 2002)
  • GDA: Version 1.1 (7. January 2002)
  • Geneland: (12. April 2011)
  • GENEPOP: Version 4.1 (24. March 2011)
  • GENETIX: Version 4.05 (5. May 2004)
  • GESTE / BayeScan: Version 2.0 / 2.01
  • Immanc (BayesAss): Version 5.0 (8. October 1998)
  • IM (IMa): updated 17. December 2009
  • IMa2: updated 26. September 2011
  • KML: Version 2.2
  • MEGA: Version 5 (24. April 2011)
  • MIGRATE: Version 3.2.6 (13. October 2010)
  • MSA: Version 4.05
  • MSVar: Version 0.4.1.b (7. April 1999)
  • NewHybrids: Version 1.1 beta (7. April 2003)
  • ONeSAMP: Version 1.2
  • PHYLIP (DNA/ RNA /distance matrix): Version 3.69 (September 2009)
  • SAM: Version 1.4 (17. April 2011)
  • STRUCTURE: Version 2.3.3 (January 2010)
  • VCF: 4.1 (2. August 2012) without structural variants (only SNP and INDELs)


future formats


future extensions

  • enhanced error catching


file extensions and handled data types

handled data types
format extension NGS DNA RNA Microsat SNP RFLP AFLP Standard Allele frequency distance
Arlequin .arp x x x x x x x
BAM .bam x x x
BAPS .txt x x x x x
BATWING .txt x x
BCF .bcf x x x x
CONVERT .txt x x x x x
FASTA no standard file extension, .fa, .mpfa, .fna, .fsa, .fas, .fasta or .txt x x x
FASTQ no standard file extension, .fastq, .fq or .txt x
FDist2 (datacal) no standard file extension x x x x x x x
FSTAT .dat x x x x
GDA .nex x x x x x
GENELAND .txt x x x x x
GENEPOP .txt x x x x
GENETIX .gtx x x x x x
GESTE / BayeScan no standard file extension x x x x
HGDP-CEPH .arp (x) (x) x (x) (x) (x)
Immanc .inp or .txt x x x x x
IM (IMa) .u or .txt x x
IMa2 .u or .txt x x
KML .kml
MEGA .meg x x x
MIGRATE no standard file extension, .txt x x x x x
MSA .dat, .txt x
MSVar no standard file extension x
NewHybrids .dat, .txt x x x x
NEXUS .nex x x
PED .ped x
PGD .xml x x x x x x x x x
PHYLIP .txt x x x
SAM .sam x x x
STRUCTURE no standard file extension x x x x x
VCF .vcf x x x x
pgdspider.txt · Last modified: 2016/02/22 15:39 by heidi