====== MEGA ======
{{mega_logo.jpg}}


\\
**[[http://www.megasoftware.net/|MEGA]]**\\
[[http://www.megasoftware.net/mega4.pdf|documentation]]

\\
Version 5 (Aril 24, 2011)\\
MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses.


===== Program information =====
  * Windows XP, Vista, 7 (with at least 64 MB of RAM, 20 MB of available hard disk space)
  * MEGA also can be run on other operating systems for which Windows emulators are available:
    * Macintosh: Windows using VirtualPC
    * Sun Workstation: SoftWindows95
    * Linux: Windows using VMWare


===== Data type handled =====
  * DNA
  * RNA
  * nucleotide
  * distance
  * (protein sequences)


===== Input Files =====
  * ASCII-text files
  * extension: *.MEG
  * Importing Data from Other Formats:
    * CLUSTAL
    * [[NEXUS]]
    * [[PHYLIP]] (Interleaved/Noninterleaved)
    * GCG
    * [[FASTA]]
    * PIR
    * NBRF
    * MSF
    * IG
    * Internet (NCBI) XML format


==== Common Features ====
  * first line: must contain the keyword #MEGA
  * second line: data file may contain a succinct description of the data (called Title). The **Title statement** is written according to a set of rules:
    * always begins with ''!Title'' and ends with a semicolon
    * not occupy more than one line of text
    * must not contain a semicolon inside the statement
    * example: <code>
#mega
!Title This is an example title;
</code>
  * third line: **Description statement**: more descriptive multi-line account of the data. 
    * always begins with ''!Description'' and ends with a semicolon
    * may occupy multiple lines
    * must not contain a semicolon inside the statement
    * example: <code>
#mega
!Title This is an example title;
!Description This is detailed information the data file;
</code>
  * **Format statement**: which includes information on the type of data present in the file and some of its attributes.
    * written after the Title or the Description statement
    * contains one or more command statements
    * A command statement contains a command and a valid setting keyword (''command=keyword format'')

\\
  * Comments: 
    * anywhere in the data file
    * can span multiple lines
    * enclosed in square brackets ([and])
    * can be nested
  * keywords: 
    * written in any combination of lower- and upper-case letters
  * Taxa Names: 
    * ‘#’ Sign: Every Iabel must be written on a new line, and a '#' sign must precede the label
    * no restrictions on the length of the Iabels
    * not required to be unique (although identical labels may result in ambiguities and should be avoided)
    * must start with alphanumeric characters (0-9, a-z, and A-Z) or a special character: ''-, + or .''
    * After the first character, taxa labels may contain the following additional special characters:''_, *, :, ( ), |, \, /''
    * For multiple word labels, an underscore can be used to represent a blank space 

\\


==== Sequence Input Data ====
  * must consist of two or more sequences of equal length
  * sequences must be aligned
  * written in IUPAC single-letter codes
  * Sequences can be written in any combination of upper- and lower-case letters
  * spaces and tabs are ignored
  * generally used special symbols : period (.) -> identical sites, dash (-) -> alignment gaps, question mark (?) -> missing data

\\
  * **Keywords for Format Statement:**

^ Command ^ Setting ^ Remark ^ Example ^
| DataType | DNA, RNA, nucleotide, protein | | DataType=DNA |
| NSeqs | integer | Number of sequences | NSeqs=85 |
| NTaxa | integer | Synonymous with NSeqs | NTaxa=85 |
| NSites | integer | Number of nucleotides |Nsites=4592 |
| Property | Exon, Intron, Coding, Noncoding, and End | Specifies whether a domain is protein coding. Exon and Coding are synonymous, as are Intron and Noncoding. End specifies that the domain with the given name ends at this point | Property=cyt_b |
| Indel | single character | dash (-) to identify insertion/deletions | Indel = - |
| Identical | single character | use period (.) to show identity with the first sequence | Identical = . |
| MatchChar | single character | Synonymous with the identical keyword | MatchChar = . |
| Missing | single character | use question mark (?) to indicate missing data | Missing = ? |
| CodeTable | A name | This instruction gives the name of the code table for the protein coding domains of the data | CodeTable = Standard |


  * **Defining Genes and Domains:**
    * attributes of different sites (and groups of sites, termed domains) are specified within the data "on the spot" rather than in an attributes block before or after the actual data.
    
^ Command ^ Setting ^ Remark ^ Example ^
| Domain | A name | defines a domain with the given name | Domain=first_exon |
| Gene | A name | defines a gene with the given name | Gene=cytb |
| Property | Exon, Intron, Coding, Noncoding, and End | specifies the protein-coding attribute for a domain | Property=cytb |
| CodonStart | A number | specifies the site where the next 1st-codon position will be found in a protein-coding domain | CodonStart=2 |

  * **Defining Groups:**
    * assign different taxa to groups in a sequence as well as to distance data files.
    * the name of the group is written in a set of curly brackets ({}) following the taxa name. The group name can be attached to the taxa name using an underscore or just can be appended. 
    * there should be no spaces between the taxa name and group name 

  *  **Labelling Individual Sites:**
    * The individual sites in nucleotide or amino acid data can be labeled to construct non-contiguous sets of sites. 
    * Each site can be associated with only one label 
    * A label can be a letter or a number.

=== example ===
<code>
!Gene=FirstGene Domain=Exon1 Property=Coding;
#Human_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCTCAAT
#Mouse_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCCCAAT
#Chicken_{Aves} ATGGTTTCTAGTCAGCTCACCATGATAGGTCTCAAT

!Gene=SecondGene Domain=Intron Property=Noncoding;
#Human ATTCCCAGGGAATTCCCGGGGGGTTTAAGGCCCCTTTAAAGAAAGAT
#Mouse GTAGCGCGCGTCGTCAGAGCTCCCAAGGGTAGCAGTCACAGAAAGAT
#Chicken GTAAAAAAAAAAGTCAGAGCTCCCCCCAATATATATCACAGAAAGAT

!Gene=ThirdGene Domain=Exon2 Property=Coding;
#Human ATCTGCTCTCGAGTACTGATACAAATGACTTCTGCGTACAACTGA
#Mouse ATCTGATCTCGTGTGCTGGTACGAATGATTTCTGCGTTCAACTGA
#Chicken ATCTGCTCTCGAGTACTGCTACCAATGACTTCTGCGTACAACTGA
!Label +++__-+++-a-+++-L-+++-k-+++123+++-_-+++---+++;
</code>


==== Distance Input Data ====
  * in the lower-left or in the upper-right triangular matrix
  * After writing the #mega,!Title,!Description, and !Format commands (some of which are optional), you then need to write all the taxa names (see below)
  * Taxa names are followed by the distance matrix

\\
  * **Keywords for Format Statement:**

^ Command ^ Setting ^ Remark ^ Example ^ 
| DataType | Distance | Specifies that the distance data is in the file | DataType=distance | 
| NSeqs | integer | Number of sequences | NSeqs=85 | 
| NTaxa | integer | Same as NSeqs | NTaxa=85 | 
| DataFormat | Lowerleft, upperright | Specifies whether the data is in lower left triangular matrix or the upper right triangular matrix | DataFormat=lowerleft |
 

  * **Defining Groups:**
    * see above

=== example ===
<code>
#mega
!Title: Concatenated Files;
!Format DataType=Distance DataFormat=LowerLeft NTaxa=6;

#Rodent
#Primate
#Lagomorpha
#Artiodactyla
#Carnivora
#Perissodactyla
      
0.514       
0.535 0.436       
0.530 0.388 0.418       
0.521 0.353 0.417 0.345       
0.500 0.331 0.402 0.327 0.349
</code>


===== How to cite =====
Citation for MEGA 5:
  * Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution doi: 10.1093/molbev/msr121. 

\\
Citation for MEGA 4:
  * Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599.