====== MEGA ======
{{mega_logo.jpg}}
\\
**[[http://www.megasoftware.net/|MEGA]]**\\
[[http://www.megasoftware.net/mega4.pdf|documentation]]
\\
Version 5 (Aril 24, 2011)\\
MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses.
===== Program information =====
* Windows XP, Vista, 7 (with at least 64 MB of RAM, 20 MB of available hard disk space)
* MEGA also can be run on other operating systems for which Windows emulators are available:
* Macintosh: Windows using VirtualPC
* Sun Workstation: SoftWindows95
* Linux: Windows using VMWare
===== Data type handled =====
* DNA
* RNA
* nucleotide
* distance
* (protein sequences)
===== Input Files =====
* ASCII-text files
* extension: *.MEG
* Importing Data from Other Formats:
* CLUSTAL
* [[NEXUS]]
* [[PHYLIP]] (Interleaved/Noninterleaved)
* GCG
* [[FASTA]]
* PIR
* NBRF
* MSF
* IG
* Internet (NCBI) XML format
==== Common Features ====
* first line: must contain the keyword #MEGA
* second line: data file may contain a succinct description of the data (called Title). The **Title statement** is written according to a set of rules:
* always begins with ''!Title'' and ends with a semicolon
* not occupy more than one line of text
* must not contain a semicolon inside the statement
* example:
#mega
!Title This is an example title;
* third line: **Description statement**: more descriptive multi-line account of the data.
* always begins with ''!Description'' and ends with a semicolon
* may occupy multiple lines
* must not contain a semicolon inside the statement
* example:
#mega
!Title This is an example title;
!Description This is detailed information the data file;
* **Format statement**: which includes information on the type of data present in the file and some of its attributes.
* written after the Title or the Description statement
* contains one or more command statements
* A command statement contains a command and a valid setting keyword (''command=keyword format'')
\\
* Comments:
* anywhere in the data file
* can span multiple lines
* enclosed in square brackets ([and])
* can be nested
* keywords:
* written in any combination of lower- and upper-case letters
* Taxa Names:
* ‘#’ Sign: Every Iabel must be written on a new line, and a '#' sign must precede the label
* no restrictions on the length of the Iabels
* not required to be unique (although identical labels may result in ambiguities and should be avoided)
* must start with alphanumeric characters (0-9, a-z, and A-Z) or a special character: ''-, + or .''
* After the first character, taxa labels may contain the following additional special characters:''_, *, :, ( ), |, \, /''
* For multiple word labels, an underscore can be used to represent a blank space
\\
==== Sequence Input Data ====
* must consist of two or more sequences of equal length
* sequences must be aligned
* written in IUPAC single-letter codes
* Sequences can be written in any combination of upper- and lower-case letters
* spaces and tabs are ignored
* generally used special symbols : period (.) -> identical sites, dash (-) -> alignment gaps, question mark (?) -> missing data
\\
* **Keywords for Format Statement:**
^ Command ^ Setting ^ Remark ^ Example ^
| DataType | DNA, RNA, nucleotide, protein | | DataType=DNA |
| NSeqs | integer | Number of sequences | NSeqs=85 |
| NTaxa | integer | Synonymous with NSeqs | NTaxa=85 |
| NSites | integer | Number of nucleotides |Nsites=4592 |
| Property | Exon, Intron, Coding, Noncoding, and End | Specifies whether a domain is protein coding. Exon and Coding are synonymous, as are Intron and Noncoding. End specifies that the domain with the given name ends at this point | Property=cyt_b |
| Indel | single character | dash (-) to identify insertion/deletions | Indel = - |
| Identical | single character | use period (.) to show identity with the first sequence | Identical = . |
| MatchChar | single character | Synonymous with the identical keyword | MatchChar = . |
| Missing | single character | use question mark (?) to indicate missing data | Missing = ? |
| CodeTable | A name | This instruction gives the name of the code table for the protein coding domains of the data | CodeTable = Standard |
* **Defining Genes and Domains:**
* attributes of different sites (and groups of sites, termed domains) are specified within the data "on the spot" rather than in an attributes block before or after the actual data.
^ Command ^ Setting ^ Remark ^ Example ^
| Domain | A name | defines a domain with the given name | Domain=first_exon |
| Gene | A name | defines a gene with the given name | Gene=cytb |
| Property | Exon, Intron, Coding, Noncoding, and End | specifies the protein-coding attribute for a domain | Property=cytb |
| CodonStart | A number | specifies the site where the next 1st-codon position will be found in a protein-coding domain | CodonStart=2 |
* **Defining Groups:**
* assign different taxa to groups in a sequence as well as to distance data files.
* the name of the group is written in a set of curly brackets ({}) following the taxa name. The group name can be attached to the taxa name using an underscore or just can be appended.
* there should be no spaces between the taxa name and group name
* **Labelling Individual Sites:**
* The individual sites in nucleotide or amino acid data can be labeled to construct non-contiguous sets of sites.
* Each site can be associated with only one label
* A label can be a letter or a number.
=== example ===
!Gene=FirstGene Domain=Exon1 Property=Coding;
#Human_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCTCAAT
#Mouse_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCCCAAT
#Chicken_{Aves} ATGGTTTCTAGTCAGCTCACCATGATAGGTCTCAAT
!Gene=SecondGene Domain=Intron Property=Noncoding;
#Human ATTCCCAGGGAATTCCCGGGGGGTTTAAGGCCCCTTTAAAGAAAGAT
#Mouse GTAGCGCGCGTCGTCAGAGCTCCCAAGGGTAGCAGTCACAGAAAGAT
#Chicken GTAAAAAAAAAAGTCAGAGCTCCCCCCAATATATATCACAGAAAGAT
!Gene=ThirdGene Domain=Exon2 Property=Coding;
#Human ATCTGCTCTCGAGTACTGATACAAATGACTTCTGCGTACAACTGA
#Mouse ATCTGATCTCGTGTGCTGGTACGAATGATTTCTGCGTTCAACTGA
#Chicken ATCTGCTCTCGAGTACTGCTACCAATGACTTCTGCGTACAACTGA
!Label +++__-+++-a-+++-L-+++-k-+++123+++-_-+++---+++;
==== Distance Input Data ====
* in the lower-left or in the upper-right triangular matrix
* After writing the #mega,!Title,!Description, and !Format commands (some of which are optional), you then need to write all the taxa names (see below)
* Taxa names are followed by the distance matrix
\\
* **Keywords for Format Statement:**
^ Command ^ Setting ^ Remark ^ Example ^
| DataType | Distance | Specifies that the distance data is in the file | DataType=distance |
| NSeqs | integer | Number of sequences | NSeqs=85 |
| NTaxa | integer | Same as NSeqs | NTaxa=85 |
| DataFormat | Lowerleft, upperright | Specifies whether the data is in lower left triangular matrix or the upper right triangular matrix | DataFormat=lowerleft |
* **Defining Groups:**
* see above
=== example ===
#mega
!Title: Concatenated Files;
!Format DataType=Distance DataFormat=LowerLeft NTaxa=6;
#Rodent
#Primate
#Lagomorpha
#Artiodactyla
#Carnivora
#Perissodactyla
0.514
0.535 0.436
0.530 0.388 0.418
0.521 0.353 0.417 0.345
0.500 0.331 0.402 0.327 0.349
===== How to cite =====
Citation for MEGA 5:
* Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution doi: 10.1093/molbev/msr121.
\\
Citation for MEGA 4:
* Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599.