<Individual 1 10-10> <data> <Individual 2 10-10> <data> .... <Number of individuals> <title for population 0-79> <Individuum 1 10-10> <data> <Individuum 2 10-10> <data> .... </code> * the delimiter is needed for microsatellite data * the project title is optional * the individual name has to be by default 10 characters \\ **sequences or SNPs** * non-interleaved data: <code> <Number of populations> <number of loci> [project title 0-79] <number of sites for locus1> <number of sites for locus 2> ... <Number of individuals locus1> <#ind locus 2> ... <#ind loc n> <title for population 0-79> <Individuum 1 10-10> <data locus 1> <Individuum 2 10-10> <data locus 1> .... <Individuum 1 10-10> <data locus 2> <Individuum 2 10-10> <data locus 2> .... <Number of individuals> <#ind locus 2> ... <#ind loc n> <title for population 0-79> <Individuum 1 10-10> <data locus 1> <Individuum 2 10-10> <data locus 1> .... <Individuum 1 10-10> <data locus 2> <Individuum 2 10-10> <data locus 2> .... </code> * interleaved data (not anymore supported by MIGRATE): <code> <Number of populations> <number of loci> [project title 0-79] <number of sites for locus1> <number of sites for locus 2> ... <Number of individuals locus 1> <#ind locus 2> ... <#ind loc n> <title for population 0-79> <Individual 1 10-10> <data locus 1 part 1> <Individuum 2 10-10> <data locus 1 part 1> .... <data ind1 locus 1 part 2> <data ind2 locus 1 part 2> .... <Individual 1 10-10> <data locus 2> <Individual 2 10-10> <data locus 2> .... <data ind1 locus 2 part 2> <data ind2 locus 2 part 2> .... </code> **SNPs in HapMap format**: * assumes that each SNP is biallelic * <allele> contains the nucleotide * <number> contains the number of individuals with the specific allele * <total> is the sum of both <code> <Number of populations> <number of loci> [project title 0-79] <Any Number> <title for population 0-79> <Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total> <Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total> .... <Any Number> <title for population 0-79> <Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total> <Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total> .... </code> ==== Examples ==== === Enzyme electrophoretic data (infinite allele model) === * Genotypes * missing data: "?" * can use multi-character coding when you use a delimiter * 2 populations and 11 loci and with 3 or 2 individuals per population: <code> 2 11 Migration rates between two Turkish frog populations 3 Akcapinar (between Marmaris and Adana) PB1058 ee bb ab bb bb aa aa bb ?? cc aa PB1059 ee bb ab bb bb aa aa bb bb cc aa PB1060 ee bb b? bb ab aa aa bb bb cc aa 2 Ezine (between Selcuk and Dardanelles) PB16843 ee bb ab bb aa aa aa cc bb cc aa PB16844 ee bb bb bb ab aa aa cc bb cc aa </code> * same data but with ''/'' as separator: <code> 2 11 / Migration rates between two Turkish frog populations 3 Akcapinar (between Marmaris and Adana) PB1058 e/e b/b a/b b/b b/b a/a a/a b/b ?/? c/c Rs/Rf PB1059 e/e b/b a/b b/b b/b a/a a/a b/b b/b c/c Rs/Rs PB1060 e/e b/b b/? b/b a/b a/a a/a b/b b/b c/c Rs/Rs 2 Ezine (between Selcuk and Dardanelles) PB16843 e/e b/b a/b b/b a/a a/a a/a c/c b/b c/c Rf/Rf PB16844 e/e b/b b/b b/b a/b a/a a/a c/c b/b c/c Rf/Rs</code> === Microsatellite data === * The third argument on the first line has to be a delimiter character (e.g: ".") * Genotypes * Each individual has two alleles * homozygote individual: needs to be coded as e.g.: 6.6 ("." is the delimiter) * missing data: "?’" * Alleles are coded as REPEAT NUMBERS: <code> 2 3 . Rana lessonae: Seeruecken versus Tal 2 Riedtli near Guendelhart-Hoerhausen 0 42.45 37.31 18.18 0 42.45 37.33 18.16 4 Tal near Steckborn 1 43.46 33.37 18.18 1 44.46 33.35 19.18 1 44.46 35.? 18.18 1 43.42 35.31 20.18 </code> * Alleles encoded as FRAGMENT LENGTH: * extra line with repeat number, starts with ''#M'' <code> 2 3 . Rana lessonae: Seeruecken versus Tal #M 2 2 2 2 Riedtli near Guendelhart-Hoerhausen 0 25.27 137.131 218.218 0 27.27 218.216 2 Tal near Steckborn 1 23.25 135.? 218.218 1 23.23 135.131 220.218 </code> === Sequence data === * After the individual name follows the base sequence of that species * each character being one of the letters A, B, C, D, G, H, K, M, N, O, R, S, T, U, V, W, X, Y, ?, or - * Blanks will be ignored (this allows GENEBANK and EMBL sequence entries to be read with minimum editing) * characters can be either upper or lower case * characters constitute the IUPAC (IUB) nucleic acid code plus some slight extensions: ^ Symbol ^ Meaning ^ | A | Adenine | | G | Guanine | | C | Cytosine | | T | Thymine | | U | Uracil | | Y | pYrimidine (C or T) | | R | puRine (A or G) | | W | ”Weak” (A or T) | | S | ”Strong” (C or G) | | K | ”Keto” (T or G) | | M | ”aMino” (C or A) | | B | not A (C or G or T) | | D | not C (A or G or T) | | H | not G (A or C or T) | | V | not T (A or C or G) | | X,N,? | unknown (A or C or G or T) | | O | deletion | | - | deletion | **examples:** * two populations with a single DNA-locus: <code> 2 1 Make believe data set using simulated data (1 locus) 50 3 Tallahassee (Mars) Peter ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC Donald ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC Christian ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC 3 St. Marks Lucrezia ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC Isabel ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC Yasmine ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC </code> * not interleaved (2 population with 2 loci): <code> 2 2 Make believe data set using simulated data (2 loci) 50 46 3 3 pop1 eis ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC zwo ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC drue ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC eis ACGCGGCGCGCGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC zwo ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC drue ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC 2 pop2 vier CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATGGTAGAGCGGAG fuef CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATGTTAGAGCGGCA vier TCGACTAGATCTGCAGCACATACGAGGGTCATGCGTCCCAGATGTG fuefLoc2 TCGACTAGATATGCAGCAAATACGAGGGGCATGCGTCCCAGATGTG </code> * interleaved (2 populations with 2 loci) (not anymore supported by MIGRATE): <code> 2 2 Make believe data set using simulated data (2 loci, interleaved) 50 46 3 2 pop1 eis ACACCCAACACGGCCCGCGGACA zwo ACACAAAACACGGCCCGCGGACA drue ATACCCAGCACGGCCGGCGGACA GGGGCTCGAGGGATCACTGACTGGCAC GGGGCTCGAGGGGTCACTGAGTGGCAC GGGGCTCGAGGGAGCACTGAGTGGAAC eis ACGCGGCGCGCGAACGAAGACCA zwo ACGCGGCGCGAGAACGAAGACCA AATCTTCTTGATCCCCAAGTGTC AATCTTCTTGATCCCCAAGTGTC 2 2 pop2 vier CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATG fuef CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATG GTAGAGCGGAG TTAGAGCGGCA TCGACTAGATCTG CAGCACATAC TCGACTAGATATG CAGCAAATAC GAGGGTCATGCGTCCCAGATGTG GAGGGGCATGCGTCCCAGATGTG </code> === SNP data === * uses the same nucleotide nomenclature as the sequence data * same format as sequence data * linked SNP: more than one site on one line * unlinked SNP: one site per line * two formats: * ''N'': nucleotide format <code> N 2 2 Make believe data set using simulated data (2 population and 2 loci) 1 4 3 3 pop1 ind1 A ind2 A ind3 A ind1 ACAC ind2 ACAC ind3 ACGC 2 pop2 ind4 C ind5 C ind4 TGGA ind5 TCGA </code> * ''H'': HapMap format <code> # using the HapMap data format, but does produce the same result (yet) as the dataset above H 2 2 Make believe data set using simulated data (2 population and 2 loci) 3 pop1 1 A 3 C 0 3 1000 A 3 T 0 3 1010 C 3 G 0 3 1011 A 2 G 1 3 1015 C 3 A 0 3 2 pop2 1 A 0 C 2 2 1000 A 0 T 2 2 1010 C 1 G 1 2 1011 A 0 G 2 2 1015 C 0 A 2 2 </code> ===== How to cite ===== Beerli, P. (2009) How to use migrate or why are markov chain monte carlo programs dicult to use? In G. Bertorelle, M. W. Bruford, H. C. Haue, A. Rizzoli, and C. Vernesi, editors, Population Genetics for Animal Conservation, volume 17 of Conservation Biology, pages 42-79. Cambridge University Press, Cambridge UK, 2009.

====== MIGRATE ====== {{migrate.gif?150}} \\ **[[http://popgen.sc.fsu.edu/Migrate/Migrate-n.html|MIGRATE]]**\\ [[http://popgen.sc.fsu.edu/migratedoc.pdf|documentation]] \\ Version 3.2.6 (13. October 2010)\\ Migrate estimates population parameters, effective population sizes and migration rates of n populations, using genetic data. It uses a coalescent theory approach taking into account history of mutations and uncertainty of the genealogy. The estimates of the parameter values are achieved by either a Maximum likelihood (ML-approach or Bayesian inference (BI)). ===== Program information ===== * MacOS X * Linux * Sun Solaris * Windows ===== Data type handled ===== * DNA sequence * SNP * Microsatellite * Standard (Electrophoretic marker) ===== Input Files ===== Syntax: * < token >: the token is obligatory * [token]: optional * {token}: obligatory for some * < token1|token2 >: choose one of the token kind of data * : means that this token needs to be 10 characters long * The characters for any word token can normally include special characters, punctuation, and blanks (e.g.:''Ind1 02 @ '' is legal) \\ **enzyme electrophoretic data or microsatellite data** would look like this:


  {delimiter between alleles} [project title 0-79]
 
<Individual 1 10-10> <data>
<Individual 2 10-10> <data>
....
<Number of individuals> <title for population 0-79>
<Individuum 1 10-10> <data>
<Individuum 2 10-10> <data>
....
</code>
  * the delimiter is needed for microsatellite data 
  * the project title is optional
  * the individual name has to be by default 10 characters 

\\
**sequences or SNPs** 
  * non-interleaved data: <code>
<Number of populations> <number of loci> [project title 0-79]
<number of sites for locus1> <number of sites for locus 2> ...
<Number of individuals locus1> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
<Individuum 1 10-10> <data locus 1>
<Individuum 2 10-10> <data locus 1>
....
<Individuum 1 10-10> <data locus 2>
<Individuum 2 10-10> <data locus 2>
....
<Number of individuals> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
<Individuum 1 10-10> <data locus 1>
<Individuum 2 10-10> <data locus 1>
....
<Individuum 1 10-10> <data locus 2>
<Individuum 2 10-10> <data locus 2>
....
</code>
  * interleaved data (not anymore supported by MIGRATE): <code>
<Number of populations> <number of loci> [project title 0-79]
<number of sites for locus1> <number of sites for locus 2> ...
<Number of individuals locus 1> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
<Individual 1 10-10> <data locus 1 part 1>
<Individuum 2 10-10> <data locus 1 part 1>
....
<data ind1 locus 1 part 2>
<data ind2 locus 1 part 2>
....
<Individual 1 10-10> <data locus 2>
<Individual 2 10-10> <data locus 2>
....
<data ind1 locus 2 part 2>
<data ind2 locus 2 part 2>
....
</code>

**SNPs in HapMap format**: 
  * assumes that each SNP is biallelic 
  * <allele> contains the nucleotide
  * <number> contains the number of individuals with the specific allele 
  * <total> is the sum of both <code>
<Number of populations> <number of loci> [project title 0-79]
<Any Number> <title for population 0-79>
<Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
<Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
....
<Any Number> <title for population 0-79>
<Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
<Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
....
</code>












==== Examples ====
=== Enzyme electrophoretic data (infinite allele model) ===
  * Genotypes
  * missing data: "?"
  * can use multi-character coding when you use a delimiter
  * 2 populations and 11 loci and with 3 or 2 individuals per population: <code>
 2 11 Migration rates between two Turkish frog populations
3 Akcapinar (between Marmaris and Adana)
PB1058    ee bb ab bb bb aa aa bb ?? cc aa
PB1059    ee bb ab bb bb aa aa bb bb cc aa
PB1060    ee bb b? bb ab aa aa bb bb cc aa
2 Ezine (between Selcuk and Dardanelles)
PB16843   ee bb ab bb aa aa aa cc bb cc aa
PB16844   ee bb bb bb ab aa aa cc bb cc aa
</code>
  * same data but with ''/'' as separator: <code>
 2 11 / Migration rates between two Turkish frog populations
3 Akcapinar (between Marmaris and Adana)
PB1058    e/e b/b a/b b/b b/b a/a a/a b/b ?/? c/c Rs/Rf
PB1059    e/e b/b a/b b/b b/b a/a a/a b/b b/b c/c Rs/Rs
PB1060    e/e b/b b/? b/b a/b a/a a/a b/b b/b c/c Rs/Rs
2 Ezine (between Selcuk and Dardanelles)
PB16843   e/e b/b a/b b/b a/a a/a a/a c/c b/b c/c Rf/Rf
PB16844   e/e b/b b/b b/b a/b a/a a/a c/c b/b c/c Rf/Rs</code>


=== Microsatellite data ===
  * The third argument on the first line has to be a delimiter character (e.g: ".")
  * Genotypes
  * Each individual has two alleles
  * homozygote individual: needs to be coded as e.g.: 6.6 ("." is the delimiter)
  * missing data: "?’"
  * Alleles are coded as REPEAT NUMBERS: <code>
 2 3 . Rana lessonae: Seeruecken versus Tal
2   Riedtli near Guendelhart-Hoerhausen
0         42.45 37.31 18.18
0         42.45 37.33 18.16
4   Tal near Steckborn
1         43.46 33.37 18.18
1         44.46 33.35 19.18
1         44.46 35.? 18.18
1         43.42 35.31 20.18
</code>
  * Alleles encoded as FRAGMENT LENGTH:
    * extra line with repeat number, starts with ''#M'' <code>
 2 3 . Rana lessonae: Seeruecken versus Tal
#M 2 2 2
2 Riedtli near Guendelhart-Hoerhausen
0         25.27 137.131 218.218
0         27.27 218.216
2 Tal near Steckborn
1         23.25 135.? 218.218
1         23.23 135.131 220.218
</code>    

=== Sequence data ===
  * After the individual name follows the base sequence of that species
  * each character being one of the letters A, B, C, D, G, H, K, M, N, O, R, S, T, U, V, W, X, Y, ?, or - 
  * Blanks will be ignored (this allows GENEBANK and EMBL sequence entries to be read with minimum editing)
  * characters can be either upper or lower case
  * characters constitute the IUPAC (IUB) nucleic acid code plus some slight extensions:

^ Symbol ^ Meaning ^ 
|    A   | Adenine |
|    G   | Guanine |
|    C   | Cytosine |
|    T   | Thymine |
|    U   | Uracil |
|    Y   | pYrimidine (C or T) |
|    R   | puRine (A or G) |
|    W   | ”Weak” (A or T) |
|    S   | ”Strong” (C or G) |
|    K   | ”Keto” (T or G) |
|    M   | ”aMino” (C or A) |
|    B   | not A (C or G or T) |
|    D   | not C (A or G or T) |
|    H   | not G (A or C or T) |
|    V   | not T (A or C or G) |
|  X,N,? | unknown (A or C or G or T) |
|    O   | deletion |
|    -   | deletion |

**examples:**
  * two populations with a single DNA-locus: <code>
2 1 Make believe data set using simulated data (1 locus)
50
3 Tallahassee (Mars)
Peter     ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
Donald    ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
Christian ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
3 St. Marks
Lucrezia  ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
Isabel    ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
Yasmine   ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
</code>
  * not interleaved (2 population with 2 loci): <code>
   2 2 Make believe data set using simulated data (2 loci)
50 46
3 3   pop1
eis       ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
zwo       ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
drue      ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
eis       ACGCGGCGCGCGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
zwo       ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
drue      ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
2   pop2
vier      CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATGGTAGAGCGGAG
fuef      CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATGTTAGAGCGGCA
vier      TCGACTAGATCTGCAGCACATACGAGGGTCATGCGTCCCAGATGTG
fuefLoc2  TCGACTAGATATGCAGCAAATACGAGGGGCATGCGTCCCAGATGTG
</code>
  * interleaved (2 populations with 2 loci) (not anymore supported by MIGRATE): <code>
   2 2 Make believe data set using simulated data (2 loci, interleaved)
50 46
3 2   pop1
eis       ACACCCAACACGGCCCGCGGACA
zwo       ACACAAAACACGGCCCGCGGACA
drue      ATACCCAGCACGGCCGGCGGACA
          GGGGCTCGAGGGATCACTGACTGGCAC
          GGGGCTCGAGGGGTCACTGAGTGGCAC
          GGGGCTCGAGGGAGCACTGAGTGGAAC
eis       ACGCGGCGCGCGAACGAAGACCA
zwo       ACGCGGCGCGAGAACGAAGACCA
          AATCTTCTTGATCCCCAAGTGTC
          AATCTTCTTGATCCCCAAGTGTC
2 2 pop2
vier      CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATG
fuef      CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATG
          GTAGAGCGGAG
  TTAGAGCGGCA
          TCGACTAGATCTG CAGCACATAC
          TCGACTAGATATG CAGCAAATAC
  GAGGGTCATGCGTCCCAGATGTG
  GAGGGGCATGCGTCCCAGATGTG
</code>

=== SNP data ===
  * uses the same nucleotide nomenclature as the sequence data
  * same format as sequence data
  * linked SNP: more than one site on one line
  * unlinked SNP: one site per line
  * two formats:
    * ''N'': nucleotide format <code>
N 2 2 Make believe data set using simulated data (2 population and 2 loci)
1 4
3 3 pop1
ind1      A
ind2      A
ind3      A
ind1      ACAC
ind2      ACAC
ind3      ACGC
2 pop2
ind4      C
ind5      C
ind4      TGGA
ind5      TCGA
</code>
    * ''H'': HapMap format <code>
# using the HapMap data format, but does produce the same result (yet) as the dataset above
H 2 2 Make believe data set using simulated data (2 population and 2 loci)
3 pop1
1    A 3 C 0 3
1000 A 3 T 0 3
1010 C 3 G 0 3
1011 A 2 G 1 3
1015 C 3 A 0 3
2 pop2
1    A 0 C 2 2
1000 A 0 T 2 2
1010 C 1 G 1 2
1011 A 0 G 2 2
1015 C 0 A 2 2
</code>

===== How to cite =====
Beerli, P. (2009) How to use migrate or why are markov chain monte carlo programs dicult to use? In G. Bertorelle, M. W. Bruford, H. C. Haue, A. Rizzoli, and C. Vernesi, editors, Population Genetics for Animal Conservation, volume 17 of Conservation Biology, pages 42-79. Cambridge University Press, Cambridge UK, 2009.