User Tools

Site Tools


lamarc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
lamarc [2007/12/13 13:59] heidilamarc [2008/07/22 13:31] (current) – external edit 127.0.0.1
Line 22: Line 22:
   * Microsatellites   * Microsatellites
   * electrophoretic data   * electrophoretic data
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
  
  
Line 30: Line 42:
 ===== Input Files ===== ===== Input Files =====
 === LAMARC File Converter: === === LAMARC File Converter: ===
-can convert [[PHYLIP]], RECOMBINE and [[MIGRATE]] files to a LAMARC XML file+can convert [[PHYLIP]], RECOMBINE and [[MIGRATE]] files to a LAMARC [[XML presentation|XML file]]
  
 === LAMARC XML file: === === LAMARC XML file: ===
Line 58: Line 70:
     * if you list <population> tags under more than one region, they will be matched by means of their name attributes, so the names are not optional     * if you list <population> tags under more than one region, they will be matched by means of their name attributes, so the names are not optional
     * <individual>: represents all the data for that region that comes from a single biological individual (one or more sets). Individuals can have a name attribute (optional)     * <individual>: represents all the data for that region that comes from a single biological individual (one or more sets). Individuals can have a name attribute (optional)
 +      * <sample>: indicating the actual sequences  
 +      * <phase> (optional): within sample tag indicating uncertainty about the phase of certain sites. It has an obligatory attribute "type" which can be either "known"(followed list: all sites whose phase is known and therefore need not be reconsidered during the run) or "unknown" (followed list: all sites whose phase is unknown and thus should be reconsidered). Valid values are site numbers between the value of the offset for that segment (which defaults to 1) and the length of the segment plus the offset. If the segment is longer than the number of markers you have (as is the case for SNP data), valid values here are the same values used for the 'locations' tag in the 'block' section 
 +      * <datablock> (one per segment per sample):
 +        * sequences themselves
 +        * Each datablock must have an attribute indicating the type of data it contains (type="DNA" for full DNA or RNA sequences, type="SNP" for SNP sequences, and type="Microsat" for microsatellites)
 +        * Sequence data must be aligned and of the same length for all samples within a region
 +        * "Unknown nucleotide" codes (X, N or -) can be used to fill in missing or unknown sequence
 +        * Upper- and lowercase nucleotide symbols are treated equivalently
 +        * Deletions should be coded as unknown and will be treated as unknown
 +        * Microsatellite data are coded as the number of repeats, with "?" standing for unknown data. Successive microsatellites within the same region are separated by blank spaces.
  
 +**examples:** 
 +  * minimal DNA data block describing a single region, a single segment, a single population, and two individuals with a single haplotype each. Note that while the two blocks of data are differently formatted, they contain the same number of bases; this is required since all blocks corresponding to a single segment: <code xml>
 +<data>
 +  <region name="Alcohol dehydrogenase">
 +    <population name="Seattle">
 +      <individual name="Mary">
 +        <sample>
 +          <datablock type="DNA">
 +            CTTGTAACCTAATGGCTTCCGAGATGGACTAGTGAGCCGCTTTCTC
 +            TACACCAACGCAGCACATGACGGTCTTACATGCGGAGCCCGCTCAA
 +          </datablock>
 +        </sample>
 +      </individual>
 +      <individual name="Jon">
 +        <sample>
 +          <datablock type="DNA">
 +            CTTGTAACCTAATGGCTTCCGA
 +            GATGGACTAGTGAGCCGCTTTCTC
 +            TACACCAACGCAGCACATGACG
 +            GTCTTACATGCGGAGCCCGCTCAA
 +          </datablock>
 +        </sample>
 +      </individual>
 +    </population>
 +  </region>
 +</data>
 +</code>
 +  * a microsatellite data block which also illustrates the use of multiple samples per individual. In this example, "Mary" is a heterozygote for the second microsatellite and a homozygote for the other five: <code xml>
 +<data>
 +  <region name="Alcohol dehydrogenase">
 +    <population name="Seattle">
 +      <individual name="Mary">
 +        <sample>
 +          <datablock type="Microsat">
 +              7 8 14 7 9 21
 +          </datablock>
 +        </sample>
 +        <sample>
 +          <datablock type="Microsat">
 +              7 9 14 7 9 21
 +          </datablock>
 +        </sample>
 +      </individual>
 +      <individual name="Jon">
 +        <sample>
 +          <datablock type="Microsat">
 +              7 9 14 7 10 23
 +          </datablock>
 +        </sample>
 +        <sample>
 +          <datablock type="Microsat">
 +              8 9 13 7 ? 23
 +          </datablock>
 +        </sample>
 +      </individual>
 +    </population>
 +  </region>
 +</data>
 +</code>
  
  
-   
  
  
lamarc.1197550747.txt.gz · Last modified: 2008/07/22 13:30 (external edit)