LAMARC

LAMARC

version 2.1.2b
Lamarc is a program for doing Likelihood Analysis with Metropolis Algorithm using Random Coalescence. Lamarc estimates effective population sizes, population exponential growth rates, a recombination rate, and past migration rates for one to n populations assuming a migration matrix model with asymmetric migration rates and different subpopulation sizes.

Program information

written in C++
LINUX
Mac OSX
Windows

Data type handled

DNA sequence
RNA sequence
SNP
Microsatellites
electrophoretic data

Input Files

LAMARC File Converter:

can convert PHYLIP, RECOMBINE and MIGRATE files to a LAMARC XML file

LAMARC XML file:

surrounded by <lamarc> and </lamarc>

Data section:
contains the actual molecular data, and additional information used to interpret it

enclosed in <data> tags
<region>:
- divides molecular data into “regions”
- available genetic information that is closely linked on the same chromosome and has a known map
- Use multiple regions for data composed of several disconnected bits or bits whose connections are not known
- region's name: optional name attribute
<effective-popsize> (optional):
- specify a different relative effective population size for each <region>
<spacing>:
- Information about the relative position of segments
<block>:
- Each segment is indicated by this tag
- give information about the position of the segment itself and the positions of the markers within the segment
- <length>: indicates the total length of the segment (important for SNPs)
- <map-position>: gives the position of this segment on an overall map of the region (the point at which sequencing or scanning began)
- <locations>: a list of marker positions within the region
- <offset>: the origin of the segment's numbering system with respect to the boundaries of the region
<population>:
- Within each region you can list various populations
- if you list <population> tags under more than one region, they will be matched by means of their name attributes, so the names are not optional
- <individual>: represents all the data for that region that comes from a single biological individual (one or more sets). Individuals can have a name attribute (optional)
  - <sample>: indicating the actual sequences
  - <phase> (optional): within sample tag indicating uncertainty about the phase of certain sites. It has an obligatory attribute “type” which can be either “known”(followed list: all sites whose phase is known and therefore need not be reconsidered during the run) or “unknown” (followed list: all sites whose phase is unknown and thus should be reconsidered). Valid values are site numbers between the value of the offset for that segment (which defaults to 1) and the length of the segment plus the offset. If the segment is longer than the number of markers you have (as is the case for SNP data), valid values here are the same values used for the 'locations' tag in the 'block' section
  - <datablock> (one per segment per sample):
    - sequences themselves
    - Each datablock must have an attribute indicating the type of data it contains (type=“DNA” for full DNA or RNA sequences, type=“SNP” for SNP sequences, and type=“Microsat” for microsatellites)
    - Sequence data must be aligned and of the same length for all samples within a region
    - “Unknown nucleotide” codes (X, N or -) can be used to fill in missing or unknown sequence
    - Upper- and lowercase nucleotide symbols are treated equivalently
    - Deletions should be coded as unknown and will be treated as unknown
    - Microsatellite data are coded as the number of repeats, with “?” standing for unknown data. Successive microsatellites within the same region are separated by blank spaces.

examples:

minimal DNA data block describing a single region, a single segment, a single population, and two individuals with a single haplotype each. Note that while the two blocks of data are differently formatted, they contain the same number of bases; this is required since all blocks corresponding to a single segment:

<data>
  <region name="Alcohol dehydrogenase">
    <population name="Seattle">
      <individual name="Mary">
        <sample>
          <datablock type="DNA">
            CTTGTAACCTAATGGCTTCCGAGATGGACTAGTGAGCCGCTTTCTC
            TACACCAACGCAGCACATGACGGTCTTACATGCGGAGCCCGCTCAA
          </datablock>
        </sample>
      </individual>
      <individual name="Jon">
        <sample>
          <datablock type="DNA">
            CTTGTAACCTAATGGCTTCCGA
            GATGGACTAGTGAGCCGCTTTCTC
            TACACCAACGCAGCACATGACG
            GTCTTACATGCGGAGCCCGCTCAA
          </datablock>
        </sample>
      </individual>
    </population>
  </region>
</data>

a microsatellite data block which also illustrates the use of multiple samples per individual. In this example, “Mary” is a heterozygote for the second microsatellite and a homozygote for the other five:

<data>
  <region name="Alcohol dehydrogenase">
    <population name="Seattle">
      <individual name="Mary">
        <sample>
          <datablock type="Microsat">
              7 8 14 7 9 21
          </datablock>
        </sample>
        <sample>
          <datablock type="Microsat">
              7 9 14 7 9 21
          </datablock>
        </sample>
      </individual>
      <individual name="Jon">
        <sample>
          <datablock type="Microsat">
              7 9 14 7 10 23
          </datablock>
        </sample>
        <sample>
          <datablock type="Microsat">
              8 9 13 7 ? 23
          </datablock>
        </sample>
      </individual>
    </population>
  </region>
</data>

How to cite

Kuhner, M. K., 2006 “LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters.” Bioinformatics 22(6): 768-770.

Masterarbeit, Heidi Lischer

Table of Contents