IMa2

IMa2
http://lifesci.rutgers.edu/~heylab/ProgramsandData/Programs/IMa2/Using_IMa2_8_24_2011.pdf

updated 26.08.2011
The program implements a method for generating posterior probabilities for complex demographic population genetic models. IMa2 works similarly to the older IMa program, with some important additions. IMa2 can handle data and implement a model for multiple populations (for numbers of sampled populations between one and ten) – not just two populations (as was the case with the original IM and IMa programs).

Program information

written in C++
unix/linux
windows
Mac

Data type handled

DNA
Microsatellites (STR)

Input Files

The format for data files for IMa2 is very similar to that for IM and IMa. The differences are that IMa2 requires two extra lines, one for the number of populations and one for the population tree string.

line 1 - arbitrary text, usually explaining the content of the file
- After line 1, but before line 2, comments can be included to provide explanatory information. Each line of comment must begin with a ‘#’
line 2 - number of populations (npops
line 3 - population names in order, seperated by one or more space. This order also corresponds to the order in which the populations are numbered in the population tree and the order in which the data occur for each locus.
line 4 - the population string in modified Newick format. The string contains information on the topology of the tree for the sampled populations and information on the ordering of the internal nodes in time. These internal nodes correspond to ancestral populations. The ancestral populations are numbered beginning with npops for the most recent ancestral population and proceeds up to 2×(npops-1) for the ancestor of all the sampled populations. Sampled populations in the string are represented by their respective number. Ancestral populations are represented by a colon, i.e. ‘:’, followed by their ancestral population number.
- If there is only a single population then the tree string is simply: 0.
- If there are two populations then the tree string is: (0,1):2
line 5 - the number of loci in the data set (integer)
line 6 - basic information for locus 1. This line contains at least five items separated by one or more spaces
1. Locus name (no spaces within the name)
2. n0, the sample sizes for the each population for that locus. These numbers do not need to be the same for different loci. If a population is not represented at this locus, a zero is used for that population.
3. the length of the sequence
4. Letter indicating the mutation model (I: IS, H: HKY, S: SSM, J: joint SSM and IS = HapSTR). If this letter is not included on this line, the default is IS. If SSM (S) or HapSTR (J), the letter is followed immediately (no spaces) by the number of linked STR markers within the locus.
5. Inheritance scalar (e.g.: 1 for autosome, 0.75 for X-linked, 0.25 for Y-linked or mtDNA)
6. The mutation rate per year for the locus (not per base pair). This can be left blank, but is needed for estimating parameters on demographic scales. If there are multiple STRs in the locus then there can be multiple mutation rates on this line separated by spaces. If the locus is a HapSTR, then the first mutation rate given applies to the sequence portion of the locus with subsequent values corresponding to STR markers included in the locus.
7. If the mutation rate is given, it can be followed by a range of mutation rates that can be used (with ranges for other loci in the analysis) to set priors on the ratios of mutation rate scalars. The range is entered with an open parentheses, the lowest value, a comma, the highest value, and a closed parentheses (e.g. ‘(0.00001, 0.00004)’. The range must bracket the rate. For a locus with multiple mutation rates, and multiple ranges, each range follows its corresponding mutation rate immediately on line.
line 7 - data for gene copy # 1 from population 0. The first 10 spaces are devoted to the sample name. The sequence or allele length (for SSM model) begins in column 11 of the file. The sequence for a given sample is given all on one line without gaps. For SSM or HapSTR data, the allele length assumes a step size of 1. This means that data from STRs that are multiples of lengths greater than 1 must be converted to counts of the number of base repeats (e.g. for a dinucloitide ‘CACACACACACA’ the length would be 6). Any number less than 5 causes the program to stop with an error. If the data is for an SSM model locus and there are multiple STRs, then there will be one integer on each line for each STR, separated by a space. If the locus is HapSTR (joint IS and SSM) then the STR data is given on the line, beginning at column 11, followed by the sequence data. For SSM data, as for other types of data, only one gene copy is represented on each line of the data file. Diploid genotype data must be broken up and listed, with one data line for each gene copy.
lines 8 thru line - the remainder of the data for locus 1. Each line contains the data for one sample. The data for locus 1 for population 1 immediately follow those for population 0, and so on
Additional lines for additional loci. Each locus begins with a line containing the information for that locus, in the same format as for the first locus. The sample names and sample sizes for additional loci and the inheritance scalars and mutation model for additional loci do not have to be the same as for locus 1 (generally they are not).
last line - should end with a newline so that the file ends on a blank line

example:

example for a tiny three locus data set. The mutation rate per year is known and specified for locus 1, but not for loci 2 and 3

Example data  for IMa
# example data set
3 
pop0 pop1 pop2
((0,1):3,2):4
3
locus1 1 1 2 13 I 0.25 0.0000000008 
pop0_1    ACTACTGTCATGA
pop1_1    AGTACTATCACGA
pop2_1    AGTACTATCACGA
pop2_2    AGTACTATCATGA
hapstrexample 2 1 0 4 J1 0.75
pop1_1    13 GTAC
pop1_2    12 GTAT
pop2_1    12 GTAT
strexample 2 2 2 1 S3 1  0.00001  0.000015 0.00008
strpop01a 23 12 9
strpop01b 26 10 11
strpop11a 25 10 9
strpop11b 31 11 9
strpop21a 26 12 11
strpop21b 26 13 12

How to cite

Hey, J. 2010b. Isolation with Migration Models for More Than Two Populations. Mol. Biol. Evol. 27:905-920.
Hey, J. 2010a. The Divergence of Chimpanzee Species and Subspecies as Revealed in Multipopulation Isolation-with-Migration Analyses. Mol. Biol. Evol. 27:921-933.

Masterarbeit, Heidi Lischer

Table of Contents