Differences

This shows you the differences between two versions of the page.

--- structure [2007/12/07 10:03] – heidi
+++ structure [2008/07/22 13:31] – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
 ====== STRUCTURE ======
-{{structure.jpg?600}}
+{{structure.jpg?650}}
 \\
-**[[http://pritch.bsd.uchicago.edu/structure.html|STRUCTURE]]**
+**[[http://pritch.bsd.uchicago.edu/structure.html|STRUCTURE]]**\\
 [[http://pritch.bsd.uchicago.edu/software/structure22/readme.pdf|documentation]]
@@ Line 16: / Line 16: @@
   * Windows
   * Mac OS X
 ===== Data type handled =====
-  * SNP
+  * SNP (numeric)
   * Microsatellites
   * RFLP
   * AFLP
   *dipoid/haploid
@@ Line 32: / Line 37: @@
 The entire data set is arranged as a matrix in a single file, in which the data for individuals are in rows, and the loci are in columns. For a diploid organism, data for each individual can be stored either as 2 consecutive rows, where each locus is in one column, or in one row, where each locus is in two consecutive columns.
-\\
 === rows: ===
   * **Marker Names** (Optional; string):  The first row can contain a list of identifiers for each of the markers in the data set. This row contains L strings of integers or characters, where L is the number of loci.
@@ Line 38: / Line 42: @@
   * **Inter-Marker Distances** (Optional; real numbers): the next row is a set of inter-marker distances, for use with linked loci (contains L real numbers). These should be genetic distances (e.g., centiMorgans), or some proxy for this based, for example, on physical distances. The markers must be in map order within linkage groups. When consecutive markers are from different linkage groups (e.g., different chromosomes), this should be indicated by the value -1. The first marker is also assigned the value -1. All other distances are non-negative.
   * **Phase Information** (Optional; diploid data only; real number in the range [0,1]): This is for use with the linkage model only. A single row of L probabilities that appears after the genotype data for each individual. There are two alternative representations for the phase information:
-    - the two rows of data for an individual are assumed to correspond to the paternal and maternal contributions, The phase line indicates the probability that the ordering is correct at the current marker (set MARKOVPHASE=0)respectively.
+    - the two rows of data for an individual are assumed to correspond to the paternal and maternal contributions, The phase line indicates the probability that the ordering is correct at the current marker (set MARKOVPHASE=0) respectively.
     - the phase line indicates the probability that the phase of one allele relative to the previous allele is correct (set MARKOVPHASE=1)
 The first entry should be filled in with 0.5 to fill out the line to L entries. For example the following data input would represent the information from an male with 5 unphased autosomal microsatellite loci followed by three X chromosome loci, using the maternal/paternal phase model (the 0.5 indicates that the autosomal loci are unphased, and the 1.0s indicate that the X chromosome loci are have been maternally inherited with probability 1.0, and hence are phased.:<code>
 156 165 101 143 105 104 101
@@ Line 56: / Line 60: @@
   * **Extra Columns** (Optional; string): It may be convenient for the user to include additional data in the input file which are ignored by the program. These go here, and may be strings of integers or characters.
   * **Genotype Data** (Required; integer): Each allele at a given locus should be coded by a unique integer (eg microsatellite repeat score).
 === Missing genotype data: ===
 Missing data should be indicated by a number that doesn't occur elsewhere in the data (often -9 by convention). The missing-data value is set along with the other parameters describing the characteristics of the data set.
+\\
 ==== example: ====
-example for the genotype data:
+example for genotype data:
 <code>
             loc_a  loc_b  loc_c  loc_d  loc_e