This is an old revision of the document!
Table of Contents
Arlequin
Arlequin ver 3.11 (released 19 February 2007)
The goal of Arlequin is to provide the average user in population genetics with quite a large set of basic methods and statistical tests, in order to extract information on genetic and demographic features of a collection of population samples.
The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods.
Computes indices of genetic diversity, F-statistics and genetic distances between populations; exact test of HWE, LD and population differentiation; tests selective neutrality within populations; Mantel test; estimates gametic phase from multilocus genotypes; estimated demographic parameters form mismatch distribution
Program information
- program written in C++
- Windows version (2000, XP, and above)
Data type handled
- DNA sequences
- RFLP
- SNP
- Microsattelite
- Standard data
- Allele frequency data
In haplotypic form:
- haplotypes (i.e. combination of alleles at one or more loci)
- haploid/diploid
in genotypic form:
- genotypes
- diploid
- known/unknown gametic phase
- recessive/no recessive alleles
Input Files
Contain the description of the properties of the data, as well as the raw data themselves. The input files should have a “*.arp” extension (for ARlequin Project).
structured into two main sections:
- Profile section (mandatory)
- Data section (mandatory):
- Haplotype list (optional)
- Distance matrices (optional)
- Samples (mandatory)
- Genetic structure (optional)
- Mantel tests (optional)
example:
The following small example is a project file containing four populations. The data type is STANDARD genotypic data with unknown gametic phase:
[Profile] Title="Fake HLA data" NbSamples=4 GenotypicData=1 GameticPhase=0 DataType=STANDARD LocusSeparator=WHITESPACE MissingData='?' [Data] [[Samples]] SampleName="A sample of 6 Algerians" SampleSize=6 SampleData={ 1 1 1104 0200 0700 0301 3 3 0302 0200 1310 0402 4 2 0402 0602 1502 0602 } SampleName="A sample of 11 Bulgarians" SampleSize=11 SampleData={ 1 1 1103 0301 0301 0200 2 4 1101 0301 0700 0200 3 1 1500 0502 0301 0200 4 1 1103 0301 1202 0301 5 1 0301 0200 1500 0601 6 3 1600 0502 1301 0603 } SampleName="A sample of 12 Egyptians" SampleSize=12 SampleData={ 1 2 1104 0301 1600 0502 3 1 1303 0301 1101 0502 4 3 1502 0601 1500 0602 6 1 1101 0301 1101 0301 8 4 1302 0502 1101 0609 9 1 1500 0302 0402 0602 } SampleName="A sample of 8 French" SampleSize=8 SampleData={ 219 1 0301 0200 0101 0501 239 2 0301 0200 0301 0200 249 1 1302 0604 1500 0602 250 3 1401 0503 1301 0603 254 1 1302 0604 } [[Structure]] StructureName="My population structure" NbGroups=2 Group={ "A sample of 6 Algerians" "A sample of 12 Egyptians" } Group={ "A sample of 11 Bulgarians" "A sample of 8 French" }
Profile section:
- Title (string within “”):
Title=”title xy”
- Number of samples (int 1-1000):
NbSamples =3
- Type of data (DNA, RFLP, MICROSAT, STANDARD, FREQUENCY):
DataType = DNA
- Haplotypic/genotypic data (0/1):
GenotypicData = 0
- Optionally (default value):
- locus separator (WHITESPACE, TAB, NONE, …):
LocusSeparator = TAB
- gametic phase known/unknown (1/0):
GameticPhase = 1
- recessive/ co-dominant allele (1/0):
RecessiveData = 1
- code for recessive allele (string within “null”):
RecessiveAllel =”xxx”
- code for missing data (character within “?” or ‘?’):
MissingData = ‘$’
- frequencies as absolute/relative values (ABS/REL):
Frequency = ABS
- significant digits for haplotype frequency outputs (real number 1e-2 – 1e-7(1e-5)):
FrequencyThreshold = 0.00001
- convergence criterion for the EM algorithm (real number 1e-7 – 1e-12):
EpsilonValue = 1e-10
Data section:
Haplotype list (optional):
define list of haplotypes (intern or extern)
- intern:
[[HaplotypeDefinition]] #start the section of Haplotype definition HaplListName="list1" #give any name you whish to this list HaplList={ h1 A T #on each line, the name of the haplotype is h2 G C # followed by its definition. h3 A G h4 A A h5 G G }
- extern:
[[HaplotypeDefinition]] #start the section of Haplotype definition HaplListName="list1" #give any name you whish to this list HaplList = EXTERN "hapl_file.hap"
Distance matrix (optional):
matrix of genetic distances between haplotypes can be specified (intern or extern)
- intern:
[[DistanceMatrix]] #start the distance matrix definition section MatrixName= "none" # name of the distance matrix MatrixSize= 4 # size = number of lines of the distance matrix MatrixData={ h1 h2 h3 h4 # labels of the distance matrix (identifier of the 0.00000 # haplotypes) 2.00000 0.00000 1.00000 2.00000 0.00000 1.00000 2.00000 1.00000 0.00000 }
- extern:
[[DistanceMatrix]] #start the distance matrix definition section MatrixName= "none" # name of the distance matrix MatrixSize= 4 # size = number of lines of the distance matrix MatrixData= EXTERN "mat_file.dis"
Samples (obligatory):
Defines haplotypic/genotypic content of the different samples
- name for each sample (string within “”):
SampleName = “name xy”
- size of sample (int value):
SampleSize = 732
- data itself (list of haplotypes or genotypes and their frequencies, entered with braces):
[[Samples]] #start the samples definition section SampleData={ id1 1 ACGGTGTCGA id2 2 ACGGTGTCAG id3 8 ACGGTGCCAA id4 10 ACAGTGTCAA id5 1 GCGGTGTCAA }
frequency data:
SampleData={ id1 1 id2 2 id3 8 id4 10 id5 1 }
- haplotypic data: for each haplotype its identifier and sample frequency (no haplotype list has been defined: also allelic content of the haplotype)
- genotypic data: for each genotype its identifier, sample frequency, allelic content (on two separate lines). As list of genotypes or list of individuals.
Id1 2 ACTCGGGTTCGCGCGC # the first pseudo-haplotype ACTCGGGCTCACGCGC # the second pseudo-haplotype
or
my_id 4 0 0 1 1 0 1 0 1 0 0 1 1
Genetic structure (only required for AMOVA):
specifies the hierarchical genetic structure of the samples. It is possible to define groups of populations.
- start of the subsection:
[[Structure]]
- name for the genetic structure (string within “”):
StructureName = “A example”
- number of groups defined in the structure (int value):
NbGroups = 5
- group definitions (list containing the names of the samples belonging to the group, entered within braces):
NbGroups=2 Group ={ population1 population2 population3 } Group ={ population4 population5 }
Mantel test settings
allows to specify some distance matrices. The goal is to compute a correlation between the Ymatrix and X1 or a partial correlation between the Ymatrix, X1 and X2. The Ymatrix can be either a pairwise population FST matrix or a custom matrix entered into the project by the user. X1 (and X2) have to be defined in the project.
- start of the subsection:
[[Mantel]]
- size of the matrices (pos. int value):
MatrixSize= 5
- number of matrices among which we compute the correlations (2/3):
MatrixNumber= 2
- matrix that is used as genetic distance (“fst” (→Y=Fst)/ “log_fst” (→Y=log(Fst))/ “slatkinlinearfst” (→Y=Fst/(1-Fst))/ “log_slatkinlinearfst” (→Y=log(Fst/(1-Fst)))/ “nm” (→Y=(1-Fst)/(2 Fst))/ “custom” (→Y= user-specified in the project)):
YMatrix = “fst”
- labels that identify the columns of the YMatrix (list containing the names of the lable name belonging to the group, entered within braces):
YMatrixLabels = { "Population1 " "Population4" "Population2" "Population8" "Population5" }
- keyword that allows to define a matrix with witch the correlation with the YMatrix is computed:
DistMatMantel={ 0.00 3.20 0.00 0.47 0.76 0.00 0.00 1.23 0.37 0.00 0.22 0.37 0.21 0.38 0.00 }
- Labels defining the sub-matrix on witch the correlation is computed:
UsedYMatrixLabels={ "Population1 " "Population5" "Population8" }
How to cite
Excoffier, L. G. Laval, and S. Schneider (2005) Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1:47-50.