Differences

This shows you the differences between two versions of the page.

--- lamarc [2007/12/13 13:34] – heidi
+++ lamarc [2008/07/22 13:31] (current) – external edit 127.0.0.1
@@ Line 22: / Line 22: @@
   * Microsatellites
   * electrophoretic data
@@ Line 29: / Line 42: @@
 ===== Input Files =====
 === LAMARC File Converter: ===
-can convert [[PHYLIP]], RECOMBINE and [[MIGRATE]] files to a LAMARC XML file
+can convert [[PHYLIP]], RECOMBINE and [[MIGRATE]] files to a LAMARC [[XML presentation|XML file]]
 === LAMARC XML file: ===
@@ Line 37: / Line 50: @@
 contains the actual molecular data, and additional information used to interpret it
   * enclosed in <data> tags
+  * <region>:
+    * divides molecular data into "regions"
+    * available genetic information that is closely linked on the same chromosome and has a known map
+    * Use multiple regions for data composed of several disconnected bits or bits whose connections are not known
+    * region's name: optional name attribute
+  * <effective-popsize> (optional):
+    * specify a different relative effective population size for each <region>
+  * <spacing>:
+    * Information about the relative position of segments
+  * <block>:
+    * Each segment is indicated by this tag
+    * give information about the position of the segment itself and the positions of the markers within the segment
+    * <length>: indicates the total length of the segment (important for SNPs)
+    * <map-position>: gives the position of this segment on an overall map of the region (the point at which sequencing or scanning began)
+    * <locations>: a list of marker positions within the region
+    * <offset>: the origin of the segment's numbering system with respect to the boundaries of the region
+  * <population>:
+    * Within each region you can list various populations
+    * if you list <population> tags under more than one region, they will be matched by means of their name attributes, so the names are not optional
+    * <individual>: represents all the data for that region that comes from a single biological individual (one or more sets). Individuals can have a name attribute (optional)
+      * <sample>: indicating the actual sequences
+      * <phase> (optional): within sample tag indicating uncertainty about the phase of certain sites. It has an obligatory attribute "type" which can be either "known"(followed list: all sites whose phase is known and therefore need not be reconsidered during the run) or "unknown" (followed list: all sites whose phase is unknown and thus should be reconsidered). Valid values are site numbers between the value of the offset for that segment (which defaults to 1) and the length of the segment plus the offset. If the segment is longer than the number of markers you have (as is the case for SNP data), valid values here are the same values used for the 'locations' tag in the 'block' section
+      * <datablock> (one per segment per sample):
+        * sequences themselves
+        * Each datablock must have an attribute indicating the type of data it contains (type="DNA" for full DNA or RNA sequences, type="SNP" for SNP sequences, and type="Microsat" for microsatellites)
+        * Sequence data must be aligned and of the same length for all samples within a region
+        * "Unknown nucleotide" codes (X, N or -) can be used to fill in missing or unknown sequence
+        * Upper- and lowercase nucleotide symbols are treated equivalently
+        * Deletions should be coded as unknown and will be treated as unknown
+        * Microsatellite data are coded as the number of repeats, with "?" standing for unknown data. Successive microsatellites within the same region are separated by blank spaces.
+**examples:**
+  * minimal DNA data block describing a single region, a single segment, a single population, and two individuals with a single haplotype each. Note that while the two blocks of data are differently formatted, they contain the same number of bases; this is required since all blocks corresponding to a single segment: <code xml>
+<data>
+  <region name="Alcohol dehydrogenase">
+    <population name="Seattle">
+      <individual name="Mary">
+        <sample>
+          <datablock type="DNA">
+            CTTGTAACCTAATGGCTTCCGAGATGGACTAGTGAGCCGCTTTCTC
+            TACACCAACGCAGCACATGACGGTCTTACATGCGGAGCCCGCTCAA
+          </datablock>
+        </sample>
+      </individual>
+      <individual name="Jon">
+        <sample>
+          <datablock type="DNA">
+            CTTGTAACCTAATGGCTTCCGA
+            GATGGACTAGTGAGCCGCTTTCTC
+            TACACCAACGCAGCACATGACG
+            GTCTTACATGCGGAGCCCGCTCAA
+          </datablock>
+        </sample>
+      </individual>
+    </population>
+  </region>
+</data>
+</code>
+  * a microsatellite data block which also illustrates the use of multiple samples per individual. In this example, "Mary" is a heterozygote for the second microsatellite and a homozygote for the other five: <code xml>
+<data>
+  <region name="Alcohol dehydrogenase">
+    <population name="Seattle">
+      <individual name="Mary">
+        <sample>
+          <datablock type="Microsat">
+8 14 7 9 21
+          </datablock>
+        </sample>
+        <sample>
+          <datablock type="Microsat">
+9 14 7 9 21
+          </datablock>
+        </sample>
+      </individual>
+      <individual name="Jon">
+        <sample>
+          <datablock type="Microsat">
+9 14 7 10 23
+          </datablock>
+        </sample>
+        <sample>
+          <datablock type="Microsat">
+9 13 7 ? 23
+          </datablock>
+        </sample>
+      </individual>
+    </population>
+  </region>
+</data>
+</code>