====== Arlequin ======
{{arlequin_logo.jpg?200}}

\\
**[[http://popgen.unibe.ch/software/arlequin35/|Arlequin]]**\\
[[http://popgen.unibe.ch/software/arlequin35/man/Arlequin35.pdf|manual]]


\\
Arlequin ver 3.5 (released 24 February 2010)\\
The goal of Arlequin is to provide the average user in population genetics with quite a large set of basic methods and statistical tests, in order to extract information on genetic and demographic features of a collection of population samples.\\
The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods.\\
Computes indices of genetic diversity, F-statistics and genetic distances between populations; exact test of HWE, LD and population differentiation; tests selective neutrality within populations; Mantel test; estimates gametic phase from multilocus genotypes; estimated demographic parameters form mismatch distribution


===== Program information =====
  * program written in C++
  * Windows version (XP, Vista, 7)
  * LINUX

\\

===== Data type handled =====
  * DNA sequences
  * RFLP
  * SNP
  * Microsattelite
  * Standard data
  * Allele frequency data

\\
**In haplotypic form:** 
  * haplotypes (i.e. combination of alleles at one or more loci)
  * haploid/diploid

\\
**in genotypic form:**
  * genotypes
  * diploid
  * known/unknown gametic phase 
  * recessive/no recessive alleles

\\


===== Input Files =====
Contain the description of the properties of the data, as well as the raw data themselves. The input files should have a "*.arp" extension (for ARlequin Project).

\\
structured into two main sections:
  * Profile section (mandatory) 
  * Data section (mandatory): 
    * Haplotype list (optional) 
    * Distance matrices (optional) 
    * Samples (mandatory) 
    * Genetic structure (optional) 
    * Mantel tests (optional) 

\\

==== example: ====
The following small example is a project file containing four populations. The data type is STANDARD genotypic data with unknown gametic phase: 
<code>
[Profile] 
Title="Fake HLA data" 
  NbSamples=4 
  GenotypicData=1 
  GameticPhase=0 
  DataType=STANDARD 
  LocusSeparator=WHITESPACE 
  MissingData='?' 
[Data] 
[[Samples]] 
  SampleName="A sample of 6 Algerians" 
  SampleSize=6 
  SampleData={ 
   1 1 1104 0200 
       0700 0301 
   3 3 0302 0200 
       1310 0402 
   4 2 0402 0602 
       1502 0602 
  } 
  SampleName="A sample of 11 Bulgarians" 
  SampleSize=11 
  SampleData={ 
   1 1 1103 0301 
       0301 0200 
   2 4 1101 0301 
       0700 0200 
   3 1 1500 0502 
       0301 0200 
   4 1 1103 0301 
       1202 0301 
   5 1 0301 0200 
       1500 0601 
   6 3 1600 0502 
       1301 0603 
  } 
  SampleName="A sample of 12 Egyptians" 
  SampleSize=12 
  SampleData={ 
   1 2 1104 0301 
       1600 0502 
   3 1 1303 0301 
       1101 0502 
   4 3 1502 0601 
       1500 0602 
   6 1 1101 0301 
       1101 0301 
   8 4 1302 0502 
       1101 0609 
   9 1 1500 0302 
   0402 0602 
  } 
  SampleName="A sample of 8 French" 
  SampleSize=8 
  SampleData={ 
   219 1 0301 0200 
         0101 0501 
   239 2 0301 0200 
         0301 0200 
   249 1 1302 0604 
         1500 0602 
   250 3 1401 0503 
         1301 0603 
   254 1 1302 0604 
  } 
[[Structure]] 
  StructureName="My population structure" 
  NbGroups=2 
  Group={ 
   "A sample of 6 Algerians" 
   "A sample of 12 Egyptians" 
  } 
  Group={ 
   "A sample of 11 Bulgarians" 
   "A sample of 8 French" 
  } 
</code>


==== Profile section: ====
  * Title (string within “”): ''Title=”title xy”''
  * Number of samples (int 1-1000): ''NbSamples =3''
  * Type of data (DNA, RFLP, MICROSAT, STANDARD, FREQUENCY): ''DataType = DNA''
  * Haplotypic/genotypic data (0/1): ''GenotypicData = 0''
\\ 
  * Optionally (__default value__):
    * locus separator (__WHITESPACE__, TAB, NONE, …): ''LocusSeparator = TAB''
    * gametic phase known/unknown (__1__/0): ''GameticPhase = 1''
    * recessive/ co-dominant allele (1/__0__): ''RecessiveData = 1''
    * code for recessive allele (string within __“null”__): ''RecessiveAllel =”xxx”''
    * code for missing data (character within __“?”__ or __‘?’__): ''MissingData = ‘$’''
    * frequencies as absolute/relative values (__ABS__/REL): ''Frequency = ABS''
    * significant digits for haplotype frequency outputs (real number 1e-2 – 1e-7(__1e-5__)): ''FrequencyThreshold = 0.00001'' 
    * convergence criterion for the EM algorithm (real number __1e-7__ – 1e-12): ''EpsilonValue = 1e-10''


==== Data section: ====
=== Haplotype list (optional): ===
define list of haplotypes (intern or extern)

  * intern: 
<code>
[[HaplotypeDefinition]] #start the section of Haplotype definition 
  HaplListName="list1"  #give any name you whish to this list 
  HaplList={ 
   h1 A T              #on each line, the name of the haplotype is 
   h2 G C              # followed by its definition. 
   h3 A G 
   h4 A A 
   h5 G G 
  } 
</code>
   * extern:
<code>
[[HaplotypeDefinition]] #start the section of Haplotype definition 
  HaplListName="list1"  #give any name you whish to this list 
  HaplList = EXTERN "hapl_file.hap" 
</code>

=== Distance matrix (optional): ===
matrix of genetic distances between haplotypes can be specified (intern or extern)
  * intern:
<code>
[[DistanceMatrix]]      #start the distance matrix definition section 
   MatrixName= "none"  # name of the distance matrix 
   MatrixSize= 4        # size = number of lines of the distance matrix 
   MatrixData={        
     h1 h2 h3 h4        # labels of the distance matrix (identifier of the 
     0.00000            # haplotypes) 
     2.00000 0.00000 
     1.00000 2.00000 0.00000 
     1.00000 2.00000 1.00000 0.00000 
   } 
</code>
  * extern:
<code>
[[DistanceMatrix]]      #start the distance matrix definition section 
   MatrixName= "none"  # name of the distance matrix 
   MatrixSize= 4        # size = number of lines of the distance matrix 
   MatrixData= EXTERN "mat_file.dis" 
</code>

=== Samples (obligatory): ===
Defines haplotypic/genotypic content of the different samples
  * name for each sample (string within “”): ''SampleName = “name xy”''
  * size of sample (int value): ''SampleSize = 732''
  * data itself (list of haplotypes or genotypes and their frequencies, entered with braces): 
<code>
[[Samples]]              #start the samples definition section 
  SampleData={ 
   id1 1  ACGGTGTCGA 
   id2 2  ACGGTGTCAG 
   id3 8  ACGGTGCCAA 
   id4 10 ACAGTGTCAA 
   id5 1  GCGGTGTCAA 
  } 
</code>
frequency data:
<code>
SampleData={ 
  id1 1 
  id2 2 
  id3 8 
  id4 10 
  id5 1 
}
</code>

  * **haplotypic data:** for each haplotype its identifier and sample frequency (no haplotype list has been defined: also allelic content of the haplotype) 

  * **genotypic data:** for each genotype its identifier, sample frequency, allelic content (on two separate lines). As list of genotypes or list of individuals.
<code>
Id1 2  ACTCGGGTTCGCGCGC  # the first pseudo-haplotype 
       ACTCGGGCTCACGCGC  # the second pseudo-haplotype 
</code> 
or
<code>
my_id 4    0 0 1 1 0 1 
           0 1 0 0 1 1 
</code>

=== Genetic structure (only required for AMOVA): ===
specifies the hierarchical genetic structure of the samples. It is possible to define groups of populations.
  * start of the subsection: <code>[[Structure]]</code>
  * name for the genetic structure (string within ""): ''StructureName = "A example"''
  * number of groups defined in the structure (int value): ''NbGroups = 5''
  * group definitions (list containing the names of the samples belonging to the group, entered within braces):
<code>
NbGroups=2
Group ={
  population1
  population2
  population3
}
Group ={
  population4
  population5
}
</code> 

=== Mantel test settings ===
allows to specify some distance matrices. The goal is to compute a correlation between the Ymatrix and X1 or a partial correlation between the Ymatrix, X1 and X2. The Ymatrix can be either a pairwise population FST matrix or a custom matrix entered into the project by the user. X1 (and X2) have to be defined in the project.
  * start of the subsection: <code>[[Mantel]]</code>
  * size of the matrices (pos. int value): ''MatrixSize= 5''
  * number of matrices among which we compute the correlations (2/3): ''MatrixNumber= 2''
  * matrix that is used as genetic distance ("fst" (->Y=Fst)/ "log_fst" (->Y=log(Fst))/ "slatkinlinearfst" (->Y=Fst/(1-Fst))/ "log_slatkinlinearfst" (->Y=log(Fst/(1-Fst)))/ "nm" (->Y=(1-Fst)/(2 Fst))/ "custom" (->Y= user-specified in the project)): ''YMatrix = "fst"''
  * labels that identify the columns of the YMatrix (list containing the names of the lable name belonging to the group, entered within braces):
<code>
YMatrixLabels = {
  "Population1 " "Population4" "Population2"
  "Population8" "Population5"
}
</code>
  * keyword that allows to define a matrix with witch the correlation with the YMatrix is computed:
<code>
DistMatMantel={
  0.00
  3.20 0.00
  0.47 0.76 0.00
  0.00 1.23 0.37 0.00
  0.22 0.37 0.21 0.38 0.00
}
</code>
  * Labels defining the sub-matrix on witch the correlation is computed:
<code>
UsedYMatrixLabels={
  "Population1 "
  "Population5"
  "Population8"
}
</code>


===== How to cite =====
Excoffier, L. and H.E. L. Lischer (2010) Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 10: 564-567.