User Tools

Site Tools


migrate

MIGRATE


MIGRATE
documentation


Version 3.2.6 (13. October 2010)
Migrate estimates population parameters, effective population sizes and migration rates of n populations, using genetic data. It uses a coalescent theory approach taking into account history of mutations and uncertainty of the genealogy. The estimates of the parameter values are achieved by either a Maximum likelihood (ML-approach or Bayesian inference (BI)).

Program information

  • MacOS X
  • Linux
  • Sun Solaris
  • Windows

Data type handled

  • DNA sequence
  • SNP
  • Microsatellite
  • Standard (Electrophoretic marker)

Input Files

Syntax:

  • < token >: the token is obligatory
  • [token]: optional
  • {token}: obligatory for some
  • < token1|token2 >: choose one of the token kind of data
  • <individual1 10-10>: means that this token needs to be 10 characters long
  • The characters for any word token can normally include special characters, punctuation, and blanks (e.g.:Ind1 02 @ is legal)


enzyme electrophoretic data or microsatellite data would look like this:

<Number of populations> <number of loci> {delimiter between alleles} [project title 0-79]
<Number of individuals> <title for population 0-79>
<Individual 1 10-10> <data>
<Individual 2 10-10> <data>
....
<Number of individuals> <title for population 0-79>
<Individuum 1 10-10> <data>
<Individuum 2 10-10> <data>
....
  • the delimiter is needed for microsatellite data
  • the project title is optional
  • the individual name has to be by default 10 characters


sequences or SNPs

  • non-interleaved data:
    <Number of populations> <number of loci> [project title 0-79]
    <number of sites for locus1> <number of sites for locus 2> ...
    <Number of individuals locus1> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
    <Individuum 1 10-10> <data locus 1>
    <Individuum 2 10-10> <data locus 1>
    ....
    <Individuum 1 10-10> <data locus 2>
    <Individuum 2 10-10> <data locus 2>
    ....
    <Number of individuals> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
    <Individuum 1 10-10> <data locus 1>
    <Individuum 2 10-10> <data locus 1>
    ....
    <Individuum 1 10-10> <data locus 2>
    <Individuum 2 10-10> <data locus 2>
    ....
  • interleaved data (not anymore supported by MIGRATE):
    <Number of populations> <number of loci> [project title 0-79]
    <number of sites for locus1> <number of sites for locus 2> ...
    <Number of individuals locus 1> <#ind locus 2> ... <#ind loc n> <title for population 0-79>
    <Individual 1 10-10> <data locus 1 part 1>
    <Individuum 2 10-10> <data locus 1 part 1>
    ....
    <data ind1 locus 1 part 2>
    <data ind2 locus 1 part 2>
    ....
    <Individual 1 10-10> <data locus 2>
    <Individual 2 10-10> <data locus 2>
    ....
    <data ind1 locus 2 part 2>
    <data ind2 locus 2 part 2>
    ....

SNPs in HapMap format:

  • assumes that each SNP is biallelic
  • <allele> contains the nucleotide
  • <number> contains the number of individuals with the specific allele
  • <total> is the sum of both
    <Number of populations> <number of loci> [project title 0-79]
    <Any Number> <title for population 0-79>
    <Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
    <Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
    ....
    <Any Number> <title for population 0-79>
    <Position on chromosome locus1> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
    <Position on chromosome locus2> <TAB><allele><TAB><number><TAB><allele><TAB><number><TAB><total>
    ....

Examples

Enzyme electrophoretic data (infinite allele model)

  • Genotypes
  • missing data: “?”
  • can use multi-character coding when you use a delimiter
  • 2 populations and 11 loci and with 3 or 2 individuals per population:
     2 11 Migration rates between two Turkish frog populations
    3 Akcapinar (between Marmaris and Adana)
    PB1058    ee bb ab bb bb aa aa bb ?? cc aa
    PB1059    ee bb ab bb bb aa aa bb bb cc aa
    PB1060    ee bb b? bb ab aa aa bb bb cc aa
    2 Ezine (between Selcuk and Dardanelles)
    PB16843   ee bb ab bb aa aa aa cc bb cc aa
    PB16844   ee bb bb bb ab aa aa cc bb cc aa
  • same data but with / as separator:
     2 11 / Migration rates between two Turkish frog populations
    3 Akcapinar (between Marmaris and Adana)
    PB1058    e/e b/b a/b b/b b/b a/a a/a b/b ?/? c/c Rs/Rf
    PB1059    e/e b/b a/b b/b b/b a/a a/a b/b b/b c/c Rs/Rs
    PB1060    e/e b/b b/? b/b a/b a/a a/a b/b b/b c/c Rs/Rs
    2 Ezine (between Selcuk and Dardanelles)
    PB16843   e/e b/b a/b b/b a/a a/a a/a c/c b/b c/c Rf/Rf
    PB16844   e/e b/b b/b b/b a/b a/a a/a c/c b/b c/c Rf/Rs

Microsatellite data

  • The third argument on the first line has to be a delimiter character (e.g: “.”)
  • Genotypes
  • Each individual has two alleles
  • homozygote individual: needs to be coded as e.g.: 6.6 (“.” is the delimiter)
  • missing data: “?’”
  • Alleles are coded as REPEAT NUMBERS:
     2 3 . Rana lessonae: Seeruecken versus Tal
    2   Riedtli near Guendelhart-Hoerhausen
    0         42.45 37.31 18.18
    0         42.45 37.33 18.16
    4   Tal near Steckborn
    1         43.46 33.37 18.18
    1         44.46 33.35 19.18
    1         44.46 35.? 18.18
    1         43.42 35.31 20.18
  • Alleles encoded as FRAGMENT LENGTH:
    • extra line with repeat number, starts with #M
       2 3 . Rana lessonae: Seeruecken versus Tal
      #M 2 2 2
      2 Riedtli near Guendelhart-Hoerhausen
      0         25.27 137.131 218.218
      0         27.27 218.216
      2 Tal near Steckborn
      1         23.25 135.? 218.218
      1         23.23 135.131 220.218

Sequence data

  • After the individual name follows the base sequence of that species
  • each character being one of the letters A, B, C, D, G, H, K, M, N, O, R, S, T, U, V, W, X, Y, ?, or -
  • Blanks will be ignored (this allows GENEBANK and EMBL sequence entries to be read with minimum editing)
  • characters can be either upper or lower case
  • characters constitute the IUPAC (IUB) nucleic acid code plus some slight extensions:
Symbol Meaning
A Adenine
G Guanine
C Cytosine
T Thymine
U Uracil
Y pYrimidine (C or T)
R puRine (A or G)
W ”Weak” (A or T)
S ”Strong” (C or G)
K ”Keto” (T or G)
M ”aMino” (C or A)
B not A (C or G or T)
D not C (A or G or T)
H not G (A or C or T)
V not T (A or C or G)
X,N,? unknown (A or C or G or T)
O deletion
- deletion

examples:

  • two populations with a single DNA-locus:
    2 1 Make believe data set using simulated data (1 locus)
    50
    3 Tallahassee (Mars)
    Peter     ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
    Donald    ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
    Christian ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
    3 St. Marks
    Lucrezia  ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
    Isabel    ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
    Yasmine   ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
  • not interleaved (2 population with 2 loci):
       2 2 Make believe data set using simulated data (2 loci)
    50 46
    3 3   pop1
    eis       ACACCCAACACGGCCCGCGGACAGGGGCTCGAGGGATCACTGACTGGCAC
    zwo       ACACAAAACACGGCCCGCGGACAGGGGCTCGAGGGGTCACTGAGTGGCAC
    drue      ATACCCAGCACGGCCGGCGGACAGGGGCTCGAGGGAGCACTGAGTGGAAC
    eis       ACGCGGCGCGCGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
    zwo       ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
    drue      ACGCGGCGCGAGAACGAAGACCAAATCTTCTTGATCCCCAAGTGTC
    2   pop2
    vier      CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATGGTAGAGCGGAG
    fuef      CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATGTTAGAGCGGCA
    vier      TCGACTAGATCTGCAGCACATACGAGGGTCATGCGTCCCAGATGTG
    fuefLoc2  TCGACTAGATATGCAGCAAATACGAGGGGCATGCGTCCCAGATGTG
  • interleaved (2 populations with 2 loci) (not anymore supported by MIGRATE):
       2 2 Make believe data set using simulated data (2 loci, interleaved)
    50 46
    3 2   pop1
    eis       ACACCCAACACGGCCCGCGGACA
    zwo       ACACAAAACACGGCCCGCGGACA
    drue      ATACCCAGCACGGCCGGCGGACA
              GGGGCTCGAGGGATCACTGACTGGCAC
              GGGGCTCGAGGGGTCACTGAGTGGCAC
              GGGGCTCGAGGGAGCACTGAGTGGAAC
    eis       ACGCGGCGCGCGAACGAAGACCA
    zwo       ACGCGGCGCGAGAACGAAGACCA
              AATCTTCTTGATCCCCAAGTGTC
              AATCTTCTTGATCCCCAAGTGTC
    2 2 pop2
    vier      CAGCGCGCGTATCGCCCCATGTGGTTCGGCCAAAGAATG
    fuef      CAGCGCGAGTCTCGCCCCATGGGGTTAGGCCAAATAATG
              GTAGAGCGGAG
      TTAGAGCGGCA
              TCGACTAGATCTG CAGCACATAC
              TCGACTAGATATG CAGCAAATAC
      GAGGGTCATGCGTCCCAGATGTG
      GAGGGGCATGCGTCCCAGATGTG

SNP data

  • uses the same nucleotide nomenclature as the sequence data
  • same format as sequence data
  • linked SNP: more than one site on one line
  • unlinked SNP: one site per line
  • two formats:
    • N: nucleotide format
      N 2 2 Make believe data set using simulated data (2 population and 2 loci)
      1 4
      3 3 pop1
      ind1      A
      ind2      A
      ind3      A
      ind1      ACAC
      ind2      ACAC
      ind3      ACGC
      2 pop2
      ind4      C
      ind5      C
      ind4      TGGA
      ind5      TCGA
    • H: HapMap format
      # using the HapMap data format, but does produce the same result (yet) as the dataset above
      H 2 2 Make believe data set using simulated data (2 population and 2 loci)
      3 pop1
      1    A 3 C 0 3
      1000 A 3 T 0 3
      1010 C 3 G 0 3
      1011 A 2 G 1 3
      1015 C 3 A 0 3
      2 pop2
      1    A 0 C 2 2
      1000 A 0 T 2 2
      1010 C 1 G 1 2
      1011 A 0 G 2 2
      1015 C 0 A 2 2

How to cite

Beerli, P. (2009) How to use migrate or why are markov chain monte carlo programs dicult to use? In G. Bertorelle, M. W. Bruford, H. C. Hau e, A. Rizzoli, and C. Vernesi, editors, Population Genetics for Animal Conservation, volume 17 of Conservation Biology, pages 42-79. Cambridge University Press, Cambridge UK, 2009.

migrate.txt · Last modified: 2011/07/14 15:02 by heidi