User Tools

Site Tools


im

This is an old revision of the document!


IM

im.jpg

IM
documentation


updated 10/23/2007
IM is a program for the fitting of an isolation model with migration to haplotype data drawn from two closely related species or populations. Large numbers of loci can be studied simultaneously, and different mutation models can be used. IM estimates the divergence time and the migrations having occurred in the ancestry of two populations, which might have grown exponentially since split. Important limitations of the basic model are that it cannot account for changes in population sizes, and it cannot account for the sizes of founding populations
IMa also allows log likelihood ratio tests of nested demographic models. IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for

Program information

  • written in C
  • unix/linux
  • dos/windows
  • Mac

Data type handled

Input Files

contains the data for all loci to be considered

  • line 1 - arbitrary text, usually explaining the content of the file
    • After line 1, but before line 2, comments can be included to provide explanatory information. Each line of comment must begin with a ‘#’
  • line 2 - two population names, for populations 1 and 2 respectively, separated by one or more spaces
  • line 3 - the number of loci in the data set (integer)
  • line 4 - basic information for locus 1. This line contains at least five items separated by one or more spaces
    1. Locus name (no spaces within the name)
    2. n1, the sample size for population 1
    3. n2, the sample size for population 2
    4. the length of the sequence
    5. Letter indicating the mutation model (I: IS, H: HKY, S: SSM, J: joint SSM and IS = HapSTR). If this letter is not included on this line, the default is IS. If SSM (S) or HapSTR (J), the letter is followed immediately (no spaces) by the number of linked STR markers within the locus.
    6. Inheritance scalar (e.g.: 1 for autosome, 0.75 for X-linked, 0.25 for Y-linked or mtDNA)
    7. The mutation rate per year for the locus (not per base pair). This can be left blank, but is needed for estimating parameters on demographic scales. If there are multiple STRs in the locus then there can be multiple mutation rates on this line separated by spaces. If the locus is a HapSTR, then the first mutation rate given applies to the sequence portion of the locus with subsequent values corresponding to STR markers included in the locus.
    8. If the mutation rate is given, it can be followed by a range of mutation rates that can be used (with ranges for other loci in the analysis) to set priors on the ratios of mutation rate scalars. The range is entered with an open parentheses, the lowest value, a comma, the highest value, and a closed parentheses (e.g. ‘(0.00001, 0.00004)’. The range must bracket the rate. For a locus with multiple mutation rates, and multiple ranges, each range follows its corresponding mutation rate immediately on line.
  • line 5 - data for gene copy # 1 from population 1. The first 10 spaces are devoted to the sample name. The sequence or allele length (for SSM model) begins in column 11 of the file. The sequence for a given sample is given all on one line without gaps. For SSM or HapSTR data, the allele length assumes a step size of 1. This means that data from STRs that are multiples of lengths greater than 1 must be converted to counts of the number of base repeats (e.g. for a dinucleotide ‘CACACACACACA’ the length would be 6). If the data is for an SSM model locus and there are multiple STRs, then there will be one integer on each line for each STR, separated by a space. If the locus is HapSTR (joint IS and SSM) then the STR data is given on the line, beginning at column 11, followed by the sequence data. For SSM data, as for other types of data, only one gene copy is represented on each line of the data file. This is true even if the original data consists of diploid genotypes. In other words, diploid genotype data must be broken up and listed, with one data line for each gene copy.

How to cite

im.1196930586.txt.gz · Last modified: 2008/07/22 13:30 (external edit)