MIGRATE

MIGRATE
documentation

Version 3.2.6 (13. October 2010)
Migrate estimates population parameters, effective population sizes and migration rates of n populations, using genetic data. It uses a coalescent theory approach taking into account history of mutations and uncertainty of the genealogy. The estimates of the parameter values are achieved by either a Maximum likelihood (ML-approach or Bayesian inference (BI)).

Program information

MacOS X
Linux
Sun Solaris
Windows

Data type handled

DNA sequence
SNP
Microsatellite
Standard (Electrophoretic marker)

Input Files

Syntax:

< token >: the token is obligatory
[token]: optional
{token}: obligatory for some
< token1|token2 >: choose one of the token kind of data
<individual1 10-10>: means that this token needs to be 10 characters long
The characters for any word token can normally include special characters, punctuation, and blanks (e.g.:Ind1 02 @ is legal)

enzyme electrophoretic data or microsatellite data would look like this:

<Number of populations> <number of loci> {delimiter between alleles} [project title 0-79]
<Number of individuals> <title for population 0-79>
<Individual 1 10-10> <data>
<Individual 2 10-10> <data>
....
<Number of individuals> <title for population 0-79>
<Individuum 1 10-10> <data>
<Individuum 2 10-10> <data>
....

the delimiter is needed for microsatellite data
the project title is optional
the individual name has to be by default 10 characters

sequences or SNPs

non-interleaved data:

<Number of populations> <number of loci> [project title 0-79]
<number of sites for locus1> <number of sites for locus 2> ...
<Number of individuals locus1> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
<Individuum 1 10-10> <data locus 1>
<Individuum 2 10-10> <data locus 1>
....
<Individuum 1 10-10> <data locus 2>
<Individuum 2 10-10> <data locus 2>
....
<Number of individuals> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
<Individuum 1 10-10> <data locus 1>
<Individuum 2 10-10> <data locus 1>
....
<Individuum 1 10-10> <data locus 2>
<Individuum 2 10-10> <data locus 2>
....

interleaved data (not anymore supported by MIGRATE):

<Number of populations> <number of loci> [project title 0-79]
<number of sites for locus1> <number of sites for locus 2> ...
<Number of individuals locus 1> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
<Individual 1 10-10> <data locus 1 part 1>
<Individuum 2 10-10> <data locus 1 part 1>
....
<data ind1 locus 1 part 2>
<data ind2 locus 1 part 2>
....
<Individual 1 10-10> <data locus 2>
<Individual 2 10-10> <data locus 2>
....
<data ind1 locus 2 part 2>
<data ind2 locus 2 part 2>
....

SNPs in HapMap format:

assumes that each SNP is biallelic
<allele> contains the nucleotide
<number> contains the number of individuals with the specific allele

<total> is the sum of both

<Number of populations> <number of loci> [project title 0-79]
<Any Number> <title for population 0-79>
<Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
<Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
....
<Any Number> <title for population 0-79>
<Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
<Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
....

Examples

Enzyme electrophoretic data (infinite allele model)

Genotypes
missing data: “?”
can use multi-character coding when you use a delimiter

2 populations and 11 loci and with 3 or 2 individuals per population:

 2 11 Migration rates between two Turkish frog populations
3 Akcapinar (between Marmaris and Adana)
PB1058    ee bb ab bb bb aa aa bb ?? cc aa
PB1059    ee bb ab bb bb aa aa bb bb cc aa
PB1060    ee bb b? bb ab aa aa bb bb cc aa
2 Ezine (between Selcuk and Dardanelles)
PB16843   ee bb ab bb aa aa aa cc bb cc aa
PB16844   ee bb bb bb ab aa aa cc bb cc aa

same data but with / as separator:

 2 11 / Migration rates between two Turkish frog populations
3 Akcapinar (between Marmaris and Adana)
PB1058    e/e b/b a/b b/b b/b a/a a/a b/b ?/? c/c Rs/Rf
PB1059    e/e b/b a/b b/b b/b a/a a/a b/b b/b c/c Rs/Rs
PB1060    e/e b/b b/? b/b a/b a/a a/a b/b b/b c/c Rs/Rs
2 Ezine (between Selcuk and Dardanelles)
PB16843   e/e b/b a/b b/b a/a a/a a/a c/c b/b c/c Rf/Rf
PB16844   e/e b/b b/b b/b a/b a/a a/a c/c b/b c/c Rf/Rs

Microsatellite data

The third argument on the first line has to be a delimiter character (e.g: “.”)
Genotypes
Each individual has two alleles
homozygote individual: needs to be coded as e.g.: 6.6 (“.” is the delimiter)
missing data: “?’”

Alleles are coded as REPEAT NUMBERS:

 2 3 . Rana lessonae: Seeruecken versus Tal
2   Riedtli near Guendelhart-Hoerhausen
0         42.45 37.31 18.18
0         42.45 37.33 18.16
4   Tal near Steckborn
1         43.46 33.37 18.18
1         44.46 33.35 19.18
1         44.46 35.? 18.18
1         43.42 35.31 20.18

Alleles encoded as FRAGMENT LENGTH:

extra line with repeat number, starts with #M

 2 3 . Rana lessonae: Seeruecken versus Tal
#M 2 2 2
2 Riedtli near Guendelhart-Hoerhausen
0         25.27 137.131 218.218
0         27.27 218.216
2 Tal near Steckborn
1         23.25 135.? 218.218
1         23.23 135.131 220.218

Sequence data

After the individual name follows the base sequence of that species
each character being one of the letters A, B, C, D, G, H, K, M, N, O, R, S, T, U, V, W, X, Y, ?, or -
Blanks will be ignored (this allows GENEBANK and EMBL sequence entries to be read with minimum editing)
characters can be either upper or lower case
characters constitute the IUPAC (IUB) nucleic acid code plus some slight extensions:

Symbol	Meaning
A	Adenine
G	Guanine
C	Cytosine
T	Thymine
U	Uracil
Y	pYrimidine (C or T)
R	puRine (A or G)
W	”Weak” (A or T)
S	”Strong” (C or G)
K	”Keto” (T or G)
M	”aMino” (C or A)
B	not A (C or G or T)
D	not C (A or G or T)
H	not G (A or C or T)
V	not T (A or C or G)
X,N,?	unknown (A or C or G or T)
O	deletion
-	deletion

examples:

two populations with a single DNA-locus:

2 1 Make believe data set using simulated data (1 locus)
50
3 Tallahassee (Mars)
Peter     ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
Donald    ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
Christian ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
3 St. Marks
Lucrezia  ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
Isabel    ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
Yasmine   ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC

not interleaved (2 population with 2 loci):

   2 2 Make believe data set using simulated data (2 loci)
50 46
3 3   pop1
eis       ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
zwo       ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
drue      ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
eis       ACGCGGCGCGCGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
zwo       ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
drue      ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
2   pop2
vier      CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATGGTAGAGCGGAG
fuef      CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATGTTAGAGCGGCA
vier      TCGACTAGATCTGCAGCACATACGAGGGTCATGCGTCCCAGATGTG
fuefLoc2  TCGACTAGATATGCAGCAAATACGAGGGGCATGCGTCCCAGATGTG

interleaved (2 populations with 2 loci) (not anymore supported by MIGRATE):

   2 2 Make believe data set using simulated data (2 loci, interleaved)
50 46
3 2   pop1
eis       ACACCCAACACGGCCCGCGGACA
zwo       ACACAAAACACGGCCCGCGGACA
drue      ATACCCAGCACGGCCGGCGGACA
          GGGGCTCGAGGGATCACTGACTGGCAC
          GGGGCTCGAGGGGTCACTGAGTGGCAC
          GGGGCTCGAGGGAGCACTGAGTGGAAC
eis       ACGCGGCGCGCGAACGAAGACCA
zwo       ACGCGGCGCGAGAACGAAGACCA
          AATCTTCTTGATCCCCAAGTGTC
          AATCTTCTTGATCCCCAAGTGTC
2 2 pop2
vier      CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATG
fuef      CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATG
          GTAGAGCGGAG
  TTAGAGCGGCA
          TCGACTAGATCTG CAGCACATAC
          TCGACTAGATATG CAGCAAATAC
  GAGGGTCATGCGTCCCAGATGTG
  GAGGGGCATGCGTCCCAGATGTG

SNP data

uses the same nucleotide nomenclature as the sequence data
same format as sequence data
linked SNP: more than one site on one line
unlinked SNP: one site per line

two formats:

N: nucleotide format

N 2 2 Make believe data set using simulated data (2 population and 2 loci)
1 4
3 3 pop1
ind1      A
ind2      A
ind3      A
ind1      ACAC
ind2      ACAC
ind3      ACGC
2 pop2
ind4      C
ind5      C
ind4      TGGA
ind5      TCGA

H: HapMap format

# using the HapMap data format, but does produce the same result (yet) as the dataset above
H 2 2 Make believe data set using simulated data (2 population and 2 loci)
3 pop1
1    A 3 C 0 3
1000 A 3 T 0 3
1010 C 3 G 0 3
1011 A 2 G 1 3
1015 C 3 A 0 3
2 pop2
1    A 0 C 2 2
1000 A 0 T 2 2
1010 C 1 G 1 2
1011 A 0 G 2 2
1015 C 0 A 2 2

How to cite

Beerli, P. (2009) How to use migrate or why are markov chain monte carlo programs dicult to use? In G. Bertorelle, M. W. Bruford, H. C. Haue, A. Rizzoli, and C. Vernesi, editors, Population Genetics for Animal Conservation, volume 17 of Conservation Biology, pages 42-79. Cambridge University Press, Cambridge UK, 2009.

Masterarbeit, Heidi Lischer

Table of Contents