User Tools

Site Tools


baps

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
baps [2008/06/16 09:50] heidibaps [2013/02/20 13:24] (current) heidi
Line 4: Line 4:
  
 \\ \\
-Version 5.1\\+Version 5.4 (29.04.2010)\\
 A program for Bayesian inference of the genetic structure in a population. Assigns individuals to genetic clusters by either considering them as immigrants (mixture analysis) or ad descendants from immigrants (admixture analysis). A program for Bayesian inference of the genetic structure in a population. Assigns individuals to genetic clusters by either considering them as immigrants (mixture analysis) or ad descendants from immigrants (admixture analysis).
 +
 +
  
  
 ===== Program information ===== ===== Program information =====
-  * Windows XP/2000/Vista +  * Windows XP/Vista/7 (32-bit, 64-bit) 
-  * Mac OS X +  * Mac Snow leopard OS X (64-bit) 
-  * Linux+  * Linux (32-bit) 
 + 
 + 
  
  
Line 17: Line 22:
 ===== Data type handled ===== ===== Data type handled =====
   * haploid/diploid/(tetraploid)   * haploid/diploid/(tetraploid)
-  * SNP+  * DNA 
 +  * SNP (sequence/numeric)
   * AFLP   * AFLP
   * Microsatellite   * Microsatellite
-  * multi-allelic markers+  * Standard (multi-allelic markers)
  
  
Line 72: Line 78:
  
 \\ \\
 +
 +
  
  
Line 82: Line 90:
   * Last column contains the index of the group that is the origin of the alleles on the particular row (instead of specifying the individual)   * Last column contains the index of the group that is the origin of the alleles on the particular row (instead of specifying the individual)
   * the names of can be given in a seperate file   * the names of can be given in a seperate file
 +
 +\\
 +**example** (data from four distinct groups)
 +  * data file:<code>
 +5         1
 +7         1
 +5         1
 +3         2
 +2         2
 +-999  5     3
 +5     -999  4
 +2         4
 +3         4
 +2         4
 +</code>
 +
 +  * name file:<code>
 +American
 +European
 +African
 +Asian
 +</code>
  
 === GENEPOP format: === === GENEPOP format: ===
Line 123: Line 153:
  
 \\ \\
 +
 +
 +
 +
  
  
 ==== Spatial clustering: ==== ==== Spatial clustering: ====
-Same as above, except for the coordinate values that need to be given in a separate file: +Same as the first two above, except for the coordinate values that need to be given in a separate file: 
-  * as many rows as there are individuals (spatial clustering of individuals) or groups (spatial clustering of groups) in the molecular data set.+  * as many rows as there are individuals (spatial clustering of individuals -> sampling coordinates of each individual) or groups (spatial clustering of groups -> sampling coordinates of each group) in the molecular data set.
   * missing coordinate: two consecutive zeros   * missing coordinate: two consecutive zeros
  
 \\ \\
 **example:** **example:**
-  * Data flie: see first example+  * Data file: see first example
   * Coordinate file: <code>   * Coordinate file: <code>
 172  88 172  88
Line 142: Line 176:
  
 \\ \\
 +
  
  
 ==== Clustering of linked molecular data (sequence data): ==== ==== Clustering of linked molecular data (sequence data): ====
 === MLST data format: === === MLST data format: ===
 +  * for prokaryotik organism
   * first column: identifier where the numbering should go linearly from 1 to number of isolates (unique for each)   * first column: identifier where the numbering should go linearly from 1 to number of isolates (unique for each)
   * second column: unique ID label for each isolate (for printing results). The header could either be “Isolate” or “Strain”   * second column: unique ID label for each isolate (for printing results). The header could either be “Isolate” or “Strain”
   * third column (optional): provides a species or similar group name for the isolates   * third column (optional): provides a species or similar group name for the isolates
   * remaining columns: genes for which there are aligned sequences available   * remaining columns: genes for which there are aligned sequences available
 +  * if header is given: columns can be in different order
  
 **example:** <code> **example:** <code>
Line 171: Line 208:
  
 \\ \\
-=== BASP data format: ===+=== BAPS data format: ===
   * haploid marker data (single data row per individual)   * haploid marker data (single data row per individual)
   * diploid marker data (two rows per individual)   * diploid marker data (two rows per individual)
Line 177: Line 214:
  
 \\ \\
-  * numeric data input format or a direct sequence based format: +numeric data input format or a direct sequence based format: 
-    * numeric format: replacing each of A,C,G,T with a unique integer and missing values with a negative integer (-999)Individual Index after the sequence separated by a space +  * **numeric format:**  
-    * sequence formatIndividual Index after the sequence separated by space+    * replacing each of A,C,G,T with a unique integer and missing values with a negative integer (-999) 
 +    * Individual Index after the sequence separated by a space 
 +    * example: a single data row for individual 110 with sequence AACCG-T could lool like this: <code> 
 +65 65 67 67 71 -999 84 110 
 +</code>
  
-**example** (diploid): <code>+  * **sequence format:**  
 +    * Individual Index after the sequence separated by a space 
 +    * example (diploid): <code>
 ATTTGCCTACGTAGCCAATT 1 ATTTGCCTACGTAGCCAATT 1
 TTACCGACCTTAAAAACCTT 1 TTACCGACCTTAAAAACCTT 1
Line 188: Line 231:
 </code> </code>
  
-  * separate file of gene boundaries:+\\ 
 +  In contrast to the MLST format you need under the BAPS format to concatenate the sequences from all considered genes into a single one and tell the program about the gene boundaries in a separate file. Separate file of gene boundaries:
     * number of rows equals the number of genes     * number of rows equals the number of genes
     * at each row, the integers refer to those columns of the data matrix that correspond to the specific gene     * at each row, the integers refer to those columns of the data matrix that correspond to the specific gene
 +    * Additional zeros are used to fill the rows to have an equal number of colummns
  
 **example** (“linkage map”: 3 genes the first corresponding to the columns 1-10 in the data matrix and so on. Additional zeros result in a matrix having an equal number of columns for each row): <code> **example** (“linkage map”: 3 genes the first corresponding to the columns 1-10 in the data matrix and so on. Additional zeros result in a matrix having an equal number of columns for each row): <code>
Line 234: Line 279:
  
 ===== How to cite ===== ===== How to cite =====
 +Tang J, Hanage WP, Fraser C, Corander J. (2009). Identifying currents in the gene pool for bacterial populations using an integrative approach. PLoS Computational Biology, 5(8): e1000455.
 +\\
 Corander, J., Waldmann, P., Marttinen, P. and Sillanpää, M.J. (2004).  BAPS 2: enhanced possibilities for the analysis of genetic population structure, Bioinformatics,  20, 2363-2369. Corander, J., Waldmann, P., Marttinen, P. and Sillanpää, M.J. (2004).  BAPS 2: enhanced possibilities for the analysis of genetic population structure, Bioinformatics,  20, 2363-2369.
  
baps.1213602602.txt.gz · Last modified: 2008/07/22 13:30 (external edit)