====== Arlequin ====== {{arlequin_logo.jpg?200}} \\ **[[http://popgen.unibe.ch/software/arlequin35/|Arlequin]]**\\ [[http://popgen.unibe.ch/software/arlequin35/man/Arlequin35.pdf|manual]] \\ Arlequin ver 3.5 (released 24 February 2010)\\ The goal of Arlequin is to provide the average user in population genetics with quite a large set of basic methods and statistical tests, in order to extract information on genetic and demographic features of a collection of population samples.\\ The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods.\\ Computes indices of genetic diversity, F-statistics and genetic distances between populations; exact test of HWE, LD and population differentiation; tests selective neutrality within populations; Mantel test; estimates gametic phase from multilocus genotypes; estimated demographic parameters form mismatch distribution ===== Program information ===== * program written in C++ * Windows version (XP, Vista, 7) * LINUX \\ ===== Data type handled ===== * DNA sequences * RFLP * SNP * Microsattelite * Standard data * Allele frequency data \\ **In haplotypic form:** * haplotypes (i.e. combination of alleles at one or more loci) * haploid/diploid \\ **in genotypic form:** * genotypes * diploid * known/unknown gametic phase * recessive/no recessive alleles \\ ===== Input Files ===== Contain the description of the properties of the data, as well as the raw data themselves. The input files should have a "*.arp" extension (for ARlequin Project). \\ structured into two main sections: * Profile section (mandatory) * Data section (mandatory): * Haplotype list (optional) * Distance matrices (optional) * Samples (mandatory) * Genetic structure (optional) * Mantel tests (optional) \\ ==== example: ==== The following small example is a project file containing four populations. The data type is STANDARD genotypic data with unknown gametic phase: [Profile] Title="Fake HLA data" NbSamples=4 GenotypicData=1 GameticPhase=0 DataType=STANDARD LocusSeparator=WHITESPACE MissingData='?' [Data] [[Samples]] SampleName="A sample of 6 Algerians" SampleSize=6 SampleData={ 1 1 1104 0200 0700 0301 3 3 0302 0200 1310 0402 4 2 0402 0602 1502 0602 } SampleName="A sample of 11 Bulgarians" SampleSize=11 SampleData={ 1 1 1103 0301 0301 0200 2 4 1101 0301 0700 0200 3 1 1500 0502 0301 0200 4 1 1103 0301 1202 0301 5 1 0301 0200 1500 0601 6 3 1600 0502 1301 0603 } SampleName="A sample of 12 Egyptians" SampleSize=12 SampleData={ 1 2 1104 0301 1600 0502 3 1 1303 0301 1101 0502 4 3 1502 0601 1500 0602 6 1 1101 0301 1101 0301 8 4 1302 0502 1101 0609 9 1 1500 0302 0402 0602 } SampleName="A sample of 8 French" SampleSize=8 SampleData={ 219 1 0301 0200 0101 0501 239 2 0301 0200 0301 0200 249 1 1302 0604 1500 0602 250 3 1401 0503 1301 0603 254 1 1302 0604 } [[Structure]] StructureName="My population structure" NbGroups=2 Group={ "A sample of 6 Algerians" "A sample of 12 Egyptians" } Group={ "A sample of 11 Bulgarians" "A sample of 8 French" } ==== Profile section: ==== * Title (string within “”): ''Title=”title xy”'' * Number of samples (int 1-1000): ''NbSamples =3'' * Type of data (DNA, RFLP, MICROSAT, STANDARD, FREQUENCY): ''DataType = DNA'' * Haplotypic/genotypic data (0/1): ''GenotypicData = 0'' \\ * Optionally (__default value__): * locus separator (__WHITESPACE__, TAB, NONE, …): ''LocusSeparator = TAB'' * gametic phase known/unknown (__1__/0): ''GameticPhase = 1'' * recessive/ co-dominant allele (1/__0__): ''RecessiveData = 1'' * code for recessive allele (string within __“null”__): ''RecessiveAllel =”xxx”'' * code for missing data (character within __“?”__ or __‘?’__): ''MissingData = ‘$’'' * frequencies as absolute/relative values (__ABS__/REL): ''Frequency = ABS'' * significant digits for haplotype frequency outputs (real number 1e-2 – 1e-7(__1e-5__)): ''FrequencyThreshold = 0.00001'' * convergence criterion for the EM algorithm (real number __1e-7__ – 1e-12): ''EpsilonValue = 1e-10'' ==== Data section: ==== === Haplotype list (optional): === define list of haplotypes (intern or extern) * intern: [[HaplotypeDefinition]] #start the section of Haplotype definition HaplListName="list1" #give any name you whish to this list HaplList={ h1 A T #on each line, the name of the haplotype is h2 G C # followed by its definition. h3 A G h4 A A h5 G G } * extern: [[HaplotypeDefinition]] #start the section of Haplotype definition HaplListName="list1" #give any name you whish to this list HaplList = EXTERN "hapl_file.hap" === Distance matrix (optional): === matrix of genetic distances between haplotypes can be specified (intern or extern) * intern: [[DistanceMatrix]] #start the distance matrix definition section MatrixName= "none" # name of the distance matrix MatrixSize= 4 # size = number of lines of the distance matrix MatrixData={ h1 h2 h3 h4 # labels of the distance matrix (identifier of the 0.00000 # haplotypes) 2.00000 0.00000 1.00000 2.00000 0.00000 1.00000 2.00000 1.00000 0.00000 } * extern: [[DistanceMatrix]] #start the distance matrix definition section MatrixName= "none" # name of the distance matrix MatrixSize= 4 # size = number of lines of the distance matrix MatrixData= EXTERN "mat_file.dis" === Samples (obligatory): === Defines haplotypic/genotypic content of the different samples * name for each sample (string within “”): ''SampleName = “name xy”'' * size of sample (int value): ''SampleSize = 732'' * data itself (list of haplotypes or genotypes and their frequencies, entered with braces): [[Samples]] #start the samples definition section SampleData={ id1 1 ACGGTGTCGA id2 2 ACGGTGTCAG id3 8 ACGGTGCCAA id4 10 ACAGTGTCAA id5 1 GCGGTGTCAA } frequency data: SampleData={ id1 1 id2 2 id3 8 id4 10 id5 1 } * **haplotypic data:** for each haplotype its identifier and sample frequency (no haplotype list has been defined: also allelic content of the haplotype) * **genotypic data:** for each genotype its identifier, sample frequency, allelic content (on two separate lines). As list of genotypes or list of individuals. Id1 2 ACTCGGGTTCGCGCGC # the first pseudo-haplotype ACTCGGGCTCACGCGC # the second pseudo-haplotype or my_id 4 0 0 1 1 0 1 0 1 0 0 1 1 === Genetic structure (only required for AMOVA): === specifies the hierarchical genetic structure of the samples. It is possible to define groups of populations. * start of the subsection: [[Structure]] * name for the genetic structure (string within ""): ''StructureName = "A example"'' * number of groups defined in the structure (int value): ''NbGroups = 5'' * group definitions (list containing the names of the samples belonging to the group, entered within braces): NbGroups=2 Group ={ population1 population2 population3 } Group ={ population4 population5 } === Mantel test settings === allows to specify some distance matrices. The goal is to compute a correlation between the Ymatrix and X1 or a partial correlation between the Ymatrix, X1 and X2. The Ymatrix can be either a pairwise population FST matrix or a custom matrix entered into the project by the user. X1 (and X2) have to be defined in the project. * start of the subsection: [[Mantel]] * size of the matrices (pos. int value): ''MatrixSize= 5'' * number of matrices among which we compute the correlations (2/3): ''MatrixNumber= 2'' * matrix that is used as genetic distance ("fst" (->Y=Fst)/ "log_fst" (->Y=log(Fst))/ "slatkinlinearfst" (->Y=Fst/(1-Fst))/ "log_slatkinlinearfst" (->Y=log(Fst/(1-Fst)))/ "nm" (->Y=(1-Fst)/(2 Fst))/ "custom" (->Y= user-specified in the project)): ''YMatrix = "fst"'' * labels that identify the columns of the YMatrix (list containing the names of the lable name belonging to the group, entered within braces): YMatrixLabels = { "Population1 " "Population4" "Population2" "Population8" "Population5" } * keyword that allows to define a matrix with witch the correlation with the YMatrix is computed: DistMatMantel={ 0.00 3.20 0.00 0.47 0.76 0.00 0.00 1.23 0.37 0.00 0.22 0.37 0.21 0.38 0.00 } * Labels defining the sub-matrix on witch the correlation is computed: UsedYMatrixLabels={ "Population1 " "Population5" "Population8" } ===== How to cite ===== Excoffier, L. and H.E. L. Lischer (2010) Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 10: 564-567.