mega
                This is an old revision of the document!
Table of Contents
MEGA
Version 4 (May 1, 2008)
MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses.
Program information
- Windows 95/98, NT, 2000, XP, and Vista (with at least 64 MB of RAM, 20 MB of available hard disk space)
- MEGA also can be run on other operating systems for which Windows emulators are available:- Macintosh: Windows using VirtualPC
- Sun Workstation: SoftWindows95
- Linux: Windows using VMWare
 
Data type handled
- DNA
- RNA
- nucleotide
- (protein sequences)
Input Files
- ASCII-text files
- extension: *.MEG
Common Features
- first line: must contain the keyword #MEGA
- second line: data file may contain a succinct description of the data (called Title). The Title statement is written according to a set of rules:- always begins with!Titleand ends with a semicolon
- not occupy more than one line of text
- must not contain a semicolon inside the statement
- example:#mega !Title This is an example title; 
 
- third line: Description statement: more descriptive multi-line account of the data.- always begins with!Descriptionand ends with a semicolon
- may occupy multiple lines
- must not contain a semicolon inside the statement
- example:#mega !Title This is an example title; !Description This is detailed information the data file; 
 
- Format statement: which includes information on the type of data present in the file and some of its attributes.- written after the Title or the Description statement
- contains one or more command statements
- A command statement contains a command and a valid setting keyword (command=keyword format
 
- Comments: can be written anywhere in the data file and can span multiple lines. They must always be enclosed in square brackets ([and]) brackets and can be nested.
- keywords: can be written in any combination of lower- and upper-case letters
- Rules for Taxa Names: Distance matrices as well as sequence data may come from species, populations, or individuals. These evolutionary entities are designated as OTUs (Operational Taxonomic Units) or taxa. Each taxon must have an identification tag, i.e., a taxon Iabel according to the following conventions:- ‘#’ Sign: Every Iabel must be written on a new line, and a '#' sign must precede the label. There are no restrictions on the length of the Iabels in the data file. The labels are not required to be unique, although identical labels may result in ambiguities and should be avoided.
- Characters: Taxa labels must start with alphanumeric characters (0-9, a-z, and A-Z) or a special character:-, + or .. After the first character, taxa labels may contain the following additional special characters:_, *, :, ( ), |, \, /. For multiple word labels, an underscore can be used to represent a blank space.
 
Sequence Input Data
- must consist of two or more sequences of equal length
- sequences must be aligned
- written in IUPAC single-letter codes
- Sequences can be written in any combination of upper- and lower-case letters
- spaces and tabs are ignored
- generally used special symbols : period (.) → identical sites, dash (-) → alignment gaps, question mark (?) → missing data
- Keywords for Format Statement:
| Command | Setting | Remark | Example | 
|---|---|---|---|
| DataType | DNA, RNA, nucleotide, protein | DataType=DNA | |
| NSeqs | integer | Number of sequences | NSeqs=85 | 
| NTaxa | integer | Synonymous with NSeqs | NTaxa=85 | 
| NSites | integer | Number of nucleotides | Nsites=4592 | 
| Property | Exon, Intron, Coding, Noncoding, and End | Specifies whether a domain is protein coding. Exon and Coding are synonymous, as are Intron and Noncoding. End specifies that the domain with the given name ends at this point | Property=cyt_b | 
| Indel | single character | dash (-) to identify insertion/deletions | Indel = - | 
| Identical | single character | use period (.) to show identity with the first sequence | Identical = . | 
| MatchChar | single character | Synonymous with the identical keyword | MatchChar = . | 
| Missing | single character | use question mark (?) to indicate missing data | Missing = ? | 
- Defining Genes and Domains:- attributes of different sites (and groups of sites, termed domains) are specified within the data “on the spot” rather than in an attributes block before or after the actual data.
 
| Command | Setting | Remark | Example | 
|---|---|---|---|
| Domain | A name | defines a domain with the given name | Domain=first_exon | 
| Gene | A name | defines a gene with the given name | Gene=cytb | 
| Property | Exon, Intron, Coding, Noncoding, and End | specifies the protein-coding attribute for a domain | Property=cytb | 
| CodonStart | A number | specifies the site where the next 1st-codon position will be found in a protein-coding domain | CodonStart=2 | 
- Defining Groups:- assign different taxa to groups in a sequence as well as to distance data files.
- the name of the group is written in a set of curly brackets ({}) following the taxa name. The group name can be attached to the taxa name using an underscore or just can be appended.
- there should be no spaces between the taxa name and group name
 
- Labelling Individual Sites:¨- The individual sites in nucleotide or amino acid data can be labeled to construct non-contiguous sets of sites.
- Each site can be associated with only one label
- A label can be a letter or a
 
number.
example
!Gene=FirstGene Domain=Exon1 Property=Coding;
#Human_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCTCAAT
#Mouse_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCCCAAT
#Chicken_{Aves} ATGGTTTCTAGTCAGCTCACCATGATAGGTCTCAAT
!Gene=SecondGene Domain=Intron Property=Noncoding;
#Human ATTCCCAGGGAATTCCCGGGGGGTTTAAGGCCCCTTTAAAGAAAGAT
#Mouse GTAGCGCGCGTCGTCAGAGCTCCCAAGGGTAGCAGTCACAGAAAGAT
#Chicken GTAAAAAAAAAAGTCAGAGCTCCCCCCAATATATATCACAGAAAGAT
!Gene=ThirdGene Domain=Exon2 Property=Coding;
#Human ATCTGCTCTCGAGTACTGATACAAATGACTTCTGCGTACAACTGA
#Mouse ATCTGATCTCGTGTGCTGGTACGAATGATTTCTGCGTTCAACTGA
#Chicken ATCTGCTCTCGAGTACTGCTACCAATGACTTCTGCGTACAACTGA
!Label +++__-+++-a-+++-L-+++-k-+++123+++-_-+++---+++;
How to cite
- When referring to MEGA in the main text of your publication, you may choose a format such as:
 
Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (Tamura, Dudley, Nei, and Kumar 2007).
- When including a MEGA citation in the Literature Cited/Bibliography section, you may use the following:
 
Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24:1596-1599. (Publication PDF at http://www.kumarlab.net/publications)
mega.1211209796.txt.gz · Last modified: 2008/07/22 13:30 (external edit)
                
                 
 