Table of Contents

STRUCTURE

structure.jpg


STRUCTURE
documentation


Version 2.3.3 (January 2010)
The program structure implements a model-based clustering method for inferring population structure using genotype data consisting of unlinked markers. It includes inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.

Program information

Data type handled

Input Files

The entire data set is arranged as a matrix in a single file, in which the data for individuals are in rows, and the loci are in columns. For a diploid organism, data for each individual can be stored either as 2 consecutive rows, where each locus is in one column, or in one row, where each locus is in two consecutive columns.

rows:

The first entry should be filled in with 0.5 to fill out the line to L entries. For example the following data input would represent the information from an male with 5 unphased autosomal microsatellite loci followed by three X chromosome loci, using the maternal/paternal phase model (the 0.5 indicates that the autosomal loci are unphased, and the 1.0s indicate that the X chromosome loci are have been maternally inherited with probability 1.0, and hence are phased.:

102 156 165 101 143 105 104 101
100 148 163 101 143  -9  -9  -9
0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0

one or more rows.

Individual/genotype data:

Each row of individual data contains the following elements. These form columns in the data file:

Missing genotype data:

Missing data should be indicated by a number that doesn't occur elsewhere in the data (often -9 by convention). The missing-data value is set along with the other parameters describing the characteristics of the data set.


example:

example for genotype data:

            loc_a  loc_b  loc_c  loc_d  loc_e
George   1   -9     145     66     0     92
George   1   -9     -9      64     0     94
Paula    1   106    142     68     1     92
Paula    1   106    148     64     0     94
Matthew  2   110    145     -9     0     92
Matthew  2   110    148     66     1     -9
Bob      2   108    142     64     1     94
Bob      2   -9     142     -9     0     94
Anja     1   112    142     -9     1     -9
Anja     1   114    142     66     1     94
Peter    1   -9     145     66     0     -9
Peter    1   110    145     -9     1     -9
Carsten  2   108    145     62     0     -9
Carsten  2   110    145     64     1     92

How to cite

The basic algorithm :


Extensions to the method: