This is an old revision of the document!

Arlequin

Arlequin ver 3.11 (released 19 February 2007)
The goal of Arlequin is to provide the average user in population genetics with quite a large set of basic methods and statistical tests, in order to extract information on genetic and demographic features of a collection of population samples.
The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods.
Computes indices of genetic diversity, F-statistics and genetic distances between populations; exact test of HWE, LD and population differentiation; tests selective neutrality within populations; Mantel test; estimates gametic phase from multilocus genotypes; estimated demographic parameters form mismatch distribution

Program information

program written in C++
Windows version (2000, XP, and above)

Data type handled

DNA sequences
RFLP
SNP
Microsattelite
Standard data
Allele frequency data

In haplotypic form:

haplotypes (i.e. combination of alleles at one or more loci)
haploid/diploid

in genotypic form:

genotypes
diploid
known/unknown gametic phase
recessive/no recessive alleles

Input Files

Contain the description of the properties of the data, as well as the raw data themselves. The input files should have a “*.arp” extension (for ARlequin Project).

structured into two main sections:

Profile section (mandatory)
Data section (mandatory):
- Haplotype list (optional)
- Distance matrices (optional)
- Samples (mandatory)
- Genetic structure (optional)
- Mantel tests (optional)

example:

The following small example is a project file containing four populations. The data type is STANDARD genotypic data with unknown gametic phase:

[Profile] 
Title="Fake HLA data" 
  NbSamples=4 
  GenotypicData=1 
  GameticPhase=0 
  DataType=STANDARD 
  LocusSeparator=WHITESPACE 
  MissingData='?' 
[Data] 
[[Samples]] 
  SampleName="A sample of 6 Algerians" 
  SampleSize=6 
  SampleData={ 
   1 1 1104 0200 
       0700 0301 
   3 3 0302 0200 
       1310 0402 
   4 2 0402 0602 
       1502 0602 
  } 
  SampleName="A sample of 11 Bulgarians" 
  SampleSize=11 
  SampleData={ 
   1 1 1103 0301 
       0301 0200 
   2 4 1101 0301 
       0700 0200 
   3 1 1500 0502 
       0301 0200 
   4 1 1103 0301 
       1202 0301 
   5 1 0301 0200 
       1500 0601 
   6 3 1600 0502 
       1301 0603 
  } 
  SampleName="A sample of 12 Egyptians" 
  SampleSize=12 
  SampleData={ 
   1 2 1104 0301 
       1600 0502 
   3 1 1303 0301 
       1101 0502 
   4 3 1502 0601 
       1500 0602 
   6 1 1101 0301 
       1101 0301 
   8 4 1302 0502 
       1101 0609 
   9 1 1500 0302 
   0402 0602 
  } 
  SampleName="A sample of 8 French" 
  SampleSize=8 
  SampleData={ 
   219 1 0301 0200 
         0101 0501 
   239 2 0301 0200 
         0301 0200 
   249 1 1302 0604 
         1500 0602 
   250 3 1401 0503 
         1301 0603 
   254 1 1302 0604 
  } 
[[Structure]] 
  StructureName="My population structure" 
  NbGroups=2 
  Group={ 
   "A sample of 6 Algerians" 
   "A sample of 12 Egyptians" 
  } 
  Group={ 
   "A sample of 11 Bulgarians" 
   "A sample of 8 French" 
  }

Profile section:

Title (string within “”): Title=”title xy”
Number of samples (int 1-1000): NbSamples =3
Type of data (DNA, RFLP, MICROSAT, STANDARD, FREQUENCY): DataType = DNA
Haplotypic/genotypic data (0/1): GenotypicData = 0

Optionally (default value):
- locus separator (WHITESPACE, TAB, NONE, …): LocusSeparator = TAB
- gametic phase known/unknown (1/0): GameticPhase = 1
- recessive/ co-dominant allele (1/0): RecessiveData = 1
- code for recessive allele (string within “null”): RecessiveAllel =”xxx”
- code for missing data (character within “?” or ‘?’): MissingData = ‘$’
- frequencies as absolute/relative values (ABS/REL): Frequency = ABS
- significant digits for haplotype frequency outputs (real number 1e-2 – 1e-7(1e-5)): FrequencyThreshold = 0.00001
- convergence criterion for the EM algorithm (real number 1e-7 – 1e-12): EpsilonValue = 1e-10

Data section:

Haplotype list (optional):

define list of haplotypes (intern or extern)

intern:

[[HaplotypeDefinition]] #start the section of Haplotype definition 
  HaplListName="list1"  #give any name you whish to this list 
  HaplList={ 
   h1 A T              #on each line, the name of the haplotype is 
   h2 G C              # followed by its definition. 
   h3 A G 
   h4 A A 
   h5 G G 
  }

extern:

[[HaplotypeDefinition]] #start the section of Haplotype definition 
  HaplListName="list1"  #give any name you whish to this list 
  HaplList = EXTERN "hapl_file.hap"

Distance matrix (optional):

matrix of genetic distances between haplotypes can be specified (intern or extern)

intern:

[[DistanceMatrix]]      #start the distance matrix definition section 
   MatrixName= "none"  # name of the distance matrix 
   MatrixSize= 4        # size = number of lines of the distance matrix 
   MatrixData={        
     h1 h2 h3 h4        # labels of the distance matrix (identifier of the 
     0.00000            # haplotypes) 
     2.00000 0.00000 
     1.00000 2.00000 0.00000 
     1.00000 2.00000 1.00000 0.00000 
   }

extern:

[[DistanceMatrix]]      #start the distance matrix definition section 
   MatrixName= "none"  # name of the distance matrix 
   MatrixSize= 4        # size = number of lines of the distance matrix 
   MatrixData= EXTERN "mat_file.dis"

Samples (obligatory):

Defines haplotypic/genotypic content of the different samples

name for each sample (string within “”): SampleName = “name xy”
size of sample (int value): SampleSize = 732
data itself (list of haplotypes or genotypes and their frequencies, entered with braces):

[[Samples]]              #start the samples definition section 
  SampleData={ 
   id1 1  ACGGTGTCGA 
   id2 2  ACGGTGTCAG 
   id3 8  ACGGTGCCAA 
   id4 10 ACAGTGTCAA 
   id5 1  GCGGTGTCAA 
  }

frequency data:

SampleData={ 
  id1 1 
  id2 2 
  id3 8 
  id4 10 
  id5 1 
}

haplotypic data: for each haplotype its identifier and sample frequency (no haplotype list has been defined: also allelic content of the haplotype)

genotypic data: for each genotype its identifier, sample frequency, allelic content (on two separate lines). As list of genotypes or list of individuals.

Id1 2  ACTCGGGTTCGCGCGC  # the first pseudo-haplotype 
       ACTCGGGCTCACGCGC  # the second pseudo-haplotype

or

my_id 4    0 0 1 1 0 1 
           0 1 0 0 1 1

Genetic structure (only required for AMOVA):

specifies the hierarchical genetic structure of the samples. It is possible to define groups of populations.

start of the subsection:
```
[[Structure]]
```
name for the genetic structure (string within “”): StructureName = “A example”
number of groups defined in the structure (int value): NbGroups = 5
group definitions (list containing the names of the samples belonging to the group, entered within braces):

NbGroups=2
Group ={
  population1
  population2
  population3
}
Group ={
  population4
  population5
}

Mantel test settings

allows to specify some distance matrices. The goal is to compute a correlation between the Ymatrix and X1 or a partial correlation between the Ymatrix, X1 and X2. The Ymatrix can be either a pairwise population FST matrix or a custom matrix entered into the project by the user. X1 (and X2) have to be defined in the project.

start of the subsection:
```
[[Mantel]]
```
size of the matrices (pos. int value): MatrixSize= 5
number of matrices among which we compute the correlations (2/3): MatrixNumber= 2
matrix that is used as genetic distance (“fst” (→Y=Fst)/ “log_fst” (→Y=log(Fst))/ “slatkinlinearfst” (→Y=Fst/(1-Fst))/ “log_slatkinlinearfst” (→Y=log(Fst/(1-Fst)))/ “nm” (→Y=(1-Fst)/(2 Fst))/ “custom” (→Y= user-specified in the project)): YMatrix = “fst”
labels that identify the columns of the YMatrix (list containing the names of the lable name belonging to the group, entered within braces):

YMatrixLabels = {
  "Population1 " "Population4" "Population2"
  "Population8" "Population5"
}

keyword that allows to define a matrix with witch the correlation with the YMatrix is computed:

DistMatMantel={
  0.00
  3.20 0.00
  0.47 0.76 0.00
  0.00 1.23 0.37 0.00
  0.22 0.37 0.21 0.38 0.00
}

Labels defining the sub-matrix on witch the correlation is computed:

UsedYMatrixLabels={
  "Population1 "
  "Population5"
  "Population8"
}

How to cite

Excoffier, L. G. Laval, and S. Schneider (2005) Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1:47-50.

Masterarbeit, Heidi Lischer

Table of Contents