Version 5.4 (29.04.2010)
A program for Bayesian inference of the genetic structure in a population. Assigns individuals to genetic clusters by either considering them as immigrants (mixture analysis) or ad descendants from immigrants (admixture analysis).
example (cluster 5 diploid individuals. The first individual has alleles 5 and 7 at the first locus and so on. Individuals 1, 2 and 3 were sampled in America and individuals 4 and 5 in Europe):
5 2 1 7 2 1 5 8 2 3 9 2 2 5 3 -999 5 3 5 -999 4 2 3 4 3 8 5 2 5 5
American European
1 4
example (data from four distinct groups)
5 2 1 7 2 1 5 8 1 3 9 2 2 5 2 -999 5 3 5 -999 4 2 3 4 3 8 4 2 5 4
American European African Asian
Must provide two data files:
example (reference data from two populations (s, r). We wish to cluster three sampling units (1unit: ind1,…). If there is no relevant information for such pre-grouping of the data to be clustered, then every individual should be one sampling unit in the input data set):
--individuals with known origins-- loc1, loc2 pop s1, 0307 0202 s2, 0303 0201 pop r1, 0502 0401 r2, 0200 0404
--sampling units-- loc1, loc2 pop ind1, 0404 0304 pop ind2, 0307 0202 ind3, 0303 0102 pop ind4, 0505 0404
Same as the first two above, except for the coordinate values that need to be given in a separate file:
example:
172 88 155 96 180 78 0 0 -18 81
example:
ST Isolate Species Adk GyrB Hsp60 Mdh Pgi RecA 1 1A1 My.Splendidone 1 1 1 1 1 1 2 1B1 A.dent 2 2 2 2 2 2 …
For each chosen gene a corresponding FASTA file containing the aligned sequences for all included isolates is needed:
example:
>RecA-2 CTAGGGCTTTAACCC--CATTTGCAGTACTGTCATGTCAGTGTACTATTTCAC >RecA-2 CTAGGGCTTT-ACCCT-CATTTGCAGTACTGCCATGTCACTGTACTAATTCAC
numeric data input format or a direct sequence based format:
65 65 67 67 71 -999 84 110
ATTTGCCTACGTAGCCAATT 1 TTACCGACCTTAAAAACCTT 1 ATTTCCCAAAGGGTTTAAAA 2 TAACCGGACATAGCCAATAA 2
example (“linkage map”: 3 genes the first corresponding to the columns 1-10 in the data matrix and so on. Additional zeros result in a matrix having an equal number of columns for each row):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 20 21 22 23 24 25 26 27 0 0
Binary result file of mixture clustering
example (First two individuals are assumed to form one cluster whose ID label is 1, individual 3 is not pre assigned to either cluster and so on):
1 1 -1 2 2
Tang J, Hanage WP, Fraser C, Corander J. (2009). Identifying currents in the gene pool for bacterial populations using an integrative approach. PLoS Computational Biology, 5(8): e1000455.
Corander, J., Waldmann, P., Marttinen, P. and Sillanpää, M.J. (2004). BAPS 2: enhanced possibilities for the analysis of genetic population structure, Bioinformatics, 20, 2363-2369.