Differences

This shows you the differences between two versions of the page.

--- baps [2008/06/16 09:50] – heidi
+++ baps [2013/02/20 13:24] (current) – heidi
@@ Line 4: / Line 4: @@
 \\
-Version 5.1\\
+Version 5.4 (29.04.2010)\\
 A program for Bayesian inference of the genetic structure in a population. Assigns individuals to genetic clusters by either considering them as immigrants (mixture analysis) or ad descendants from immigrants (admixture analysis).
 ===== Program information =====
-  * Windows XP/2000/Vista
+  * Windows XP/Vista/7 (32-bit, 64-bit)
-  * Mac OS X
+  * Mac Snow leopard OS X (64-bit)
-  * Linux
+  * Linux (32-bit)
@@ Line 17: / Line 22: @@
 ===== Data type handled =====
   * haploid/diploid/(tetraploid)
-  * SNP
+  * DNA
+  * SNP (sequence/numeric)
   * AFLP
   * Microsatellite
-  * multi-allelic markers
+  * Standard (multi-allelic markers)
@@ Line 72: / Line 78: @@
 \\
@@ Line 82: / Line 90: @@
   * Last column contains the index of the group that is the origin of the alleles on the particular row (instead of specifying the individual)
   * the names of can be given in a seperate file
+\\
+**example** (data from four distinct groups)
+  * data file:<code>
+     2     1
+     2     1
+     8     1
+     9     2
+     5     2
+-999  5     3
+     -999  4
+     3     4
+     8     4
+     5     4
+</code>
+  * name file:<code>
+American
+European
+African
+Asian
+</code>
 === GENEPOP format: ===
@@ Line 123: / Line 153: @@
 \\
 ==== Spatial clustering: ====
-Same as above, except for the coordinate values that need to be given in a separate file:
+Same as the first two above, except for the coordinate values that need to be given in a separate file:
-  * as many rows as there are individuals (spatial clustering of individuals) or groups (spatial clustering of groups) in the molecular data set.
+  * as many rows as there are individuals (spatial clustering of individuals -> sampling coordinates of each individual) or groups (spatial clustering of groups -> sampling coordinates of each group) in the molecular data set.
   * missing coordinate: two consecutive zeros
 \\
 **example:**
-  * Data flie: see first example
+  * Data file: see first example
   * Coordinate file: <code>
   88
@@ Line 142: / Line 176: @@
 \\
 ==== Clustering of linked molecular data (sequence data): ====
 === MLST data format: ===
+  * for prokaryotik organism
   * first column: identifier where the numbering should go linearly from 1 to number of isolates (unique for each)
   * second column: unique ID label for each isolate (for printing results). The header could either be “Isolate” or “Strain”
   * third column (optional): provides a species or similar group name for the isolates
   * remaining columns: genes for which there are aligned sequences available
+  * if header is given: columns can be in different order
 **example:** <code>
@@ Line 171: / Line 208: @@
 \\
-=== BASP data format: ===
+=== BAPS data format: ===
   * haploid marker data (single data row per individual)
   * diploid marker data (two rows per individual)
@@ Line 177: / Line 214: @@
 \\
-  * numeric data input format or a direct sequence based format:
+numeric data input format or a direct sequence based format:
-    * numeric format: replacing each of A,C,G,T with a unique integer and missing values with a negative integer (-999). Individual Index after the sequence separated by a space
+  * **numeric format:**
-    * sequence format: Individual Index after the sequence separated by a space
+    * replacing each of A,C,G,T with a unique integer and missing values with a negative integer (-999)
+    * Individual Index after the sequence separated by a space
+    * example: a single data row for individual 110 with sequence AACCG-T could lool like this: <code>
+65 67 67 71 -999 84 110
+</code>
-**example** (diploid): <code>
+  * **sequence format:**
+    * Individual Index after the sequence separated by a space
+    * example (diploid): <code>
 ATTTGCCTACGTAGCCAATT 1
 TTACCGACCTTAAAAACCTT 1
@@ Line 188: / Line 231: @@
 </code>
-  * separate file of gene boundaries:
+\\
+  * In contrast to the MLST format you need under the BAPS format to concatenate the sequences from all considered genes into a single one and tell the program about the gene boundaries in a separate file. Separate file of gene boundaries:
     * number of rows equals the number of genes
     * at each row, the integers refer to those columns of the data matrix that correspond to the specific gene
+    * Additional zeros are used to fill the rows to have an equal number of colummns
 **example** (“linkage map”: 3 genes the first corresponding to the columns 1-10 in the data matrix and so on. Additional zeros result in a matrix having an equal number of columns for each row): <code>
@@ Line 234: / Line 279: @@
 ===== How to cite =====
+Tang J, Hanage WP, Fraser C, Corander J. (2009). Identifying currents in the gene pool for bacterial populations using an integrative approach. PLoS Computational Biology, 5(8): e1000455.
+\\
 Corander, J., Waldmann, P., Marttinen, P. and Sillanpää, M.J. (2004).  BAPS 2: enhanced possibilities for the analysis of genetic population structure, Bioinformatics,  20, 2363-2369.