====== Arlequin ======
{{arlequin_logo.jpg?200}}
\\
**[[http://popgen.unibe.ch/software/arlequin35/|Arlequin]]**\\
[[http://popgen.unibe.ch/software/arlequin35/man/Arlequin35.pdf|manual]]
\\
Arlequin ver 3.5 (released 24 February 2010)\\
The goal of Arlequin is to provide the average user in population genetics with quite a large set of basic methods and statistical tests, in order to extract information on genetic and demographic features of a collection of population samples.\\
The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods.\\
Computes indices of genetic diversity, F-statistics and genetic distances between populations; exact test of HWE, LD and population differentiation; tests selective neutrality within populations; Mantel test; estimates gametic phase from multilocus genotypes; estimated demographic parameters form mismatch distribution
===== Program information =====
* program written in C++
* Windows version (XP, Vista, 7)
* LINUX
\\
===== Data type handled =====
* DNA sequences
* RFLP
* SNP
* Microsattelite
* Standard data
* Allele frequency data
\\
**In haplotypic form:**
* haplotypes (i.e. combination of alleles at one or more loci)
* haploid/diploid
\\
**in genotypic form:**
* genotypes
* diploid
* known/unknown gametic phase
* recessive/no recessive alleles
\\
===== Input Files =====
Contain the description of the properties of the data, as well as the raw data themselves. The input files should have a "*.arp" extension (for ARlequin Project).
\\
structured into two main sections:
* Profile section (mandatory)
* Data section (mandatory):
* Haplotype list (optional)
* Distance matrices (optional)
* Samples (mandatory)
* Genetic structure (optional)
* Mantel tests (optional)
\\
==== example: ====
The following small example is a project file containing four populations. The data type is STANDARD genotypic data with unknown gametic phase:
[Profile]
Title="Fake HLA data"
NbSamples=4
GenotypicData=1
GameticPhase=0
DataType=STANDARD
LocusSeparator=WHITESPACE
MissingData='?'
[Data]
[[Samples]]
SampleName="A sample of 6 Algerians"
SampleSize=6
SampleData={
1 1 1104 0200
0700 0301
3 3 0302 0200
1310 0402
4 2 0402 0602
1502 0602
}
SampleName="A sample of 11 Bulgarians"
SampleSize=11
SampleData={
1 1 1103 0301
0301 0200
2 4 1101 0301
0700 0200
3 1 1500 0502
0301 0200
4 1 1103 0301
1202 0301
5 1 0301 0200
1500 0601
6 3 1600 0502
1301 0603
}
SampleName="A sample of 12 Egyptians"
SampleSize=12
SampleData={
1 2 1104 0301
1600 0502
3 1 1303 0301
1101 0502
4 3 1502 0601
1500 0602
6 1 1101 0301
1101 0301
8 4 1302 0502
1101 0609
9 1 1500 0302
0402 0602
}
SampleName="A sample of 8 French"
SampleSize=8
SampleData={
219 1 0301 0200
0101 0501
239 2 0301 0200
0301 0200
249 1 1302 0604
1500 0602
250 3 1401 0503
1301 0603
254 1 1302 0604
}
[[Structure]]
StructureName="My population structure"
NbGroups=2
Group={
"A sample of 6 Algerians"
"A sample of 12 Egyptians"
}
Group={
"A sample of 11 Bulgarians"
"A sample of 8 French"
}
==== Profile section: ====
* Title (string within “”): ''Title=”title xy”''
* Number of samples (int 1-1000): ''NbSamples =3''
* Type of data (DNA, RFLP, MICROSAT, STANDARD, FREQUENCY): ''DataType = DNA''
* Haplotypic/genotypic data (0/1): ''GenotypicData = 0''
\\
* Optionally (__default value__):
* locus separator (__WHITESPACE__, TAB, NONE, …): ''LocusSeparator = TAB''
* gametic phase known/unknown (__1__/0): ''GameticPhase = 1''
* recessive/ co-dominant allele (1/__0__): ''RecessiveData = 1''
* code for recessive allele (string within __“null”__): ''RecessiveAllel =”xxx”''
* code for missing data (character within __“?”__ or __‘?’__): ''MissingData = ‘$’''
* frequencies as absolute/relative values (__ABS__/REL): ''Frequency = ABS''
* significant digits for haplotype frequency outputs (real number 1e-2 – 1e-7(__1e-5__)): ''FrequencyThreshold = 0.00001''
* convergence criterion for the EM algorithm (real number __1e-7__ – 1e-12): ''EpsilonValue = 1e-10''
==== Data section: ====
=== Haplotype list (optional): ===
define list of haplotypes (intern or extern)
* intern:
[[HaplotypeDefinition]] #start the section of Haplotype definition
HaplListName="list1" #give any name you whish to this list
HaplList={
h1 A T #on each line, the name of the haplotype is
h2 G C # followed by its definition.
h3 A G
h4 A A
h5 G G
}
* extern:
[[HaplotypeDefinition]] #start the section of Haplotype definition
HaplListName="list1" #give any name you whish to this list
HaplList = EXTERN "hapl_file.hap"
=== Distance matrix (optional): ===
matrix of genetic distances between haplotypes can be specified (intern or extern)
* intern:
[[DistanceMatrix]] #start the distance matrix definition section
MatrixName= "none" # name of the distance matrix
MatrixSize= 4 # size = number of lines of the distance matrix
MatrixData={
h1 h2 h3 h4 # labels of the distance matrix (identifier of the
0.00000 # haplotypes)
2.00000 0.00000
1.00000 2.00000 0.00000
1.00000 2.00000 1.00000 0.00000
}
* extern:
[[DistanceMatrix]] #start the distance matrix definition section
MatrixName= "none" # name of the distance matrix
MatrixSize= 4 # size = number of lines of the distance matrix
MatrixData= EXTERN "mat_file.dis"
=== Samples (obligatory): ===
Defines haplotypic/genotypic content of the different samples
* name for each sample (string within “”): ''SampleName = “name xy”''
* size of sample (int value): ''SampleSize = 732''
* data itself (list of haplotypes or genotypes and their frequencies, entered with braces):
[[Samples]] #start the samples definition section
SampleData={
id1 1 ACGGTGTCGA
id2 2 ACGGTGTCAG
id3 8 ACGGTGCCAA
id4 10 ACAGTGTCAA
id5 1 GCGGTGTCAA
}
frequency data:
SampleData={
id1 1
id2 2
id3 8
id4 10
id5 1
}
* **haplotypic data:** for each haplotype its identifier and sample frequency (no haplotype list has been defined: also allelic content of the haplotype)
* **genotypic data:** for each genotype its identifier, sample frequency, allelic content (on two separate lines). As list of genotypes or list of individuals.
Id1 2 ACTCGGGTTCGCGCGC # the first pseudo-haplotype
ACTCGGGCTCACGCGC # the second pseudo-haplotype
or
my_id 4 0 0 1 1 0 1
0 1 0 0 1 1
=== Genetic structure (only required for AMOVA): ===
specifies the hierarchical genetic structure of the samples. It is possible to define groups of populations.
* start of the subsection: [[Structure]]
* name for the genetic structure (string within ""): ''StructureName = "A example"''
* number of groups defined in the structure (int value): ''NbGroups = 5''
* group definitions (list containing the names of the samples belonging to the group, entered within braces):
NbGroups=2
Group ={
population1
population2
population3
}
Group ={
population4
population5
}
=== Mantel test settings ===
allows to specify some distance matrices. The goal is to compute a correlation between the Ymatrix and X1 or a partial correlation between the Ymatrix, X1 and X2. The Ymatrix can be either a pairwise population FST matrix or a custom matrix entered into the project by the user. X1 (and X2) have to be defined in the project.
* start of the subsection: [[Mantel]]
* size of the matrices (pos. int value): ''MatrixSize= 5''
* number of matrices among which we compute the correlations (2/3): ''MatrixNumber= 2''
* matrix that is used as genetic distance ("fst" (->Y=Fst)/ "log_fst" (->Y=log(Fst))/ "slatkinlinearfst" (->Y=Fst/(1-Fst))/ "log_slatkinlinearfst" (->Y=log(Fst/(1-Fst)))/ "nm" (->Y=(1-Fst)/(2 Fst))/ "custom" (->Y= user-specified in the project)): ''YMatrix = "fst"''
* labels that identify the columns of the YMatrix (list containing the names of the lable name belonging to the group, entered within braces):
YMatrixLabels = {
"Population1 " "Population4" "Population2"
"Population8" "Population5"
}
* keyword that allows to define a matrix with witch the correlation with the YMatrix is computed:
DistMatMantel={
0.00
3.20 0.00
0.47 0.76 0.00
0.00 1.23 0.37 0.00
0.22 0.37 0.21 0.38 0.00
}
* Labels defining the sub-matrix on witch the correlation is computed:
UsedYMatrixLabels={
"Population1 "
"Population5"
"Population8"
}
===== How to cite =====
Excoffier, L. and H.E. L. Lischer (2010) Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 10: 564-567.