Differences

This shows you the differences between two versions of the page.

--- mega [2008/05/16 16:33] – heidi
+++ mega [2011/07/07 11:50] (current) – heidi
@@ Line 8: / Line 8: @@
 \\
-Version 4 (May 1, 2008)\\
+Version 5 (Aril 24, 2011)\\
 MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses.
@@ Line 15: / Line 16: @@
 ===== Program information =====
-  * Windows 95/98, NT, 2000, XP, and Vista (with at least 64 MB of RAM, 20 MB of available hard disk space)
+  * Windows XP, Vista, 7 (with at least 64 MB of RAM, 20 MB of available hard disk space)
   * MEGA also can be run on other operating systems for which Windows emulators are available:
     * Macintosh: Windows using VirtualPC
     * Sun Workstation: SoftWindows95
     * Linux: Windows using VMWare
@@ Line 27: / Line 29: @@
   * RNA
   * nucleotide
+  * distance
   * (protein sequences)
 ===== Input Files =====
+  * ASCII-text files
+  * extension: *.MEG
+  * Importing Data from Other Formats:
+    * CLUSTAL
+    * [[NEXUS]]
+    * [[PHYLIP]] (Interleaved/Noninterleaved)
+    * GCG
+    * [[FASTA]]
+    * PIR
+    * NBRF
+    * MSF
+    * IG
+    * Internet (NCBI) XML format
+==== Common Features ====
+  * first line: must contain the keyword #MEGA
+  * second line: data file may contain a succinct description of the data (called Title). The **Title statement** is written according to a set of rules:
+    * always begins with ''!Title'' and ends with a semicolon
+    * not occupy more than one line of text
+    * must not contain a semicolon inside the statement
+    * example: <code>
+#mega
+!Title This is an example title;
+</code>
+  * third line: **Description statement**: more descriptive multi-line account of the data.
+    * always begins with ''!Description'' and ends with a semicolon
+    * may occupy multiple lines
+    * must not contain a semicolon inside the statement
+    * example: <code>
+#mega
+!Title This is an example title;
+!Description This is detailed information the data file;
+</code>
+  * **Format statement**: which includes information on the type of data present in the file and some of its attributes.
+    * written after the Title or the Description statement
+    * contains one or more command statements
+    * A command statement contains a command and a valid setting keyword (''command=keyword format'')
+\\
+  * Comments:
+    * anywhere in the data file
+    * can span multiple lines
+    * enclosed in square brackets ([and])
+    * can be nested
+  * keywords:
+    * written in any combination of lower- and upper-case letters
+  * Taxa Names:
+    * ‘#’ Sign: Every Iabel must be written on a new line, and a '#' sign must precede the label
+    * no restrictions on the length of the Iabels
+    * not required to be unique (although identical labels may result in ambiguities and should be avoided)
+    * must start with alphanumeric characters (0-9, a-z, and A-Z) or a special character: ''-, + or .''
+    * After the first character, taxa labels may contain the following additional special characters:''_, *, :, ( ), |, \, /''
+    * For multiple word labels, an underscore can be used to represent a blank space
+\\
+==== Sequence Input Data ====
+  * must consist of two or more sequences of equal length
+  * sequences must be aligned
+  * written in IUPAC single-letter codes
+  * Sequences can be written in any combination of upper- and lower-case letters
+  * spaces and tabs are ignored
+  * generally used special symbols : period (.) -> identical sites, dash (-) -> alignment gaps, question mark (?) -> missing data
+\\
+  * **Keywords for Format Statement:**
+^ Command ^ Setting ^ Remark ^ Example ^
+| DataType | DNA, RNA, nucleotide, protein | | DataType=DNA |
+| NSeqs | integer | Number of sequences | NSeqs=85 |
+| NTaxa | integer | Synonymous with NSeqs | NTaxa=85 |
+| NSites | integer | Number of nucleotides |Nsites=4592 |
+| Property | Exon, Intron, Coding, Noncoding, and End | Specifies whether a domain is protein coding. Exon and Coding are synonymous, as are Intron and Noncoding. End specifies that the domain with the given name ends at this point | Property=cyt_b |
+| Indel | single character | dash (-) to identify insertion/deletions | Indel = - |
+| Identical | single character | use period (.) to show identity with the first sequence | Identical = . |
+| MatchChar | single character | Synonymous with the identical keyword | MatchChar = . |
+| Missing | single character | use question mark (?) to indicate missing data | Missing = ? |
+| CodeTable | A name | This instruction gives the name of the code table for the protein coding domains of the data | CodeTable = Standard |
+  * **Defining Genes and Domains:**
+    * attributes of different sites (and groups of sites, termed domains) are specified within the data "on the spot" rather than in an attributes block before or after the actual data.
+^ Command ^ Setting ^ Remark ^ Example ^
+| Domain | A name | defines a domain with the given name | Domain=first_exon |
+| Gene | A name | defines a gene with the given name | Gene=cytb |
+| Property | Exon, Intron, Coding, Noncoding, and End | specifies the protein-coding attribute for a domain | Property=cytb |
+| CodonStart | A number | specifies the site where the next 1st-codon position will be found in a protein-coding domain | CodonStart=2 |
+  * **Defining Groups:**
+    * assign different taxa to groups in a sequence as well as to distance data files.
+    * the name of the group is written in a set of curly brackets ({}) following the taxa name. The group name can be attached to the taxa name using an underscore or just can be appended.
+    * there should be no spaces between the taxa name and group name
+  *  **Labelling Individual Sites:**
+    * The individual sites in nucleotide or amino acid data can be labeled to construct non-contiguous sets of sites.
+    * Each site can be associated with only one label
+    * A label can be a letter or a number.
 === example ===
+<code>
+!Gene=FirstGene Domain=Exon1 Property=Coding;
+#Human_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCTCAAT
+#Mouse_{Mammal} ATGGTTTCTAGTCAGGTCACCATGATAGGTCCCAAT
+#Chicken_{Aves} ATGGTTTCTAGTCAGCTCACCATGATAGGTCTCAAT
+!Gene=SecondGene Domain=Intron Property=Noncoding;
+#Human ATTCCCAGGGAATTCCCGGGGGGTTTAAGGCCCCTTTAAAGAAAGAT
+#Mouse GTAGCGCGCGTCGTCAGAGCTCCCAAGGGTAGCAGTCACAGAAAGAT
+#Chicken GTAAAAAAAAAAGTCAGAGCTCCCCCCAATATATATCACAGAAAGAT
+!Gene=ThirdGene Domain=Exon2 Property=Coding;
+#Human ATCTGCTCTCGAGTACTGATACAAATGACTTCTGCGTACAACTGA
+#Mouse ATCTGATCTCGTGTGCTGGTACGAATGATTTCTGCGTTCAACTGA
+#Chicken ATCTGCTCTCGAGTACTGCTACCAATGACTTCTGCGTACAACTGA
+!Label +++__-+++-a-+++-L-+++-k-+++123+++-_-+++---+++;
+</code>
+==== Distance Input Data ====
+  * in the lower-left or in the upper-right triangular matrix
+  * After writing the #mega,!Title,!Description, and !Format commands (some of which are optional), you then need to write all the taxa names (see below)
+  * Taxa names are followed by the distance matrix
+\\
+  * **Keywords for Format Statement:**
+^ Command ^ Setting ^ Remark ^ Example ^
+| DataType | Distance | Specifies that the distance data is in the file | DataType=distance |
+| NSeqs | integer | Number of sequences | NSeqs=85 |
+| NTaxa | integer | Same as NSeqs | NTaxa=85 |
+| DataFormat | Lowerleft, upperright | Specifies whether the data is in lower left triangular matrix or the upper right triangular matrix | DataFormat=lowerleft |
+  * **Defining Groups:**
+    * see above
+=== example ===
+<code>
+#mega
+!Title: Concatenated Files;
+!Format DataType=Distance DataFormat=LowerLeft NTaxa=6;
+#Rodent
+#Primate
+#Lagomorpha
+#Artiodactyla
+#Carnivora
+#Perissodactyla
+.514
+.535 0.436
+.530 0.388 0.418
+.521 0.353 0.417 0.345
+.500 0.331 0.402 0.327 0.349
+</code>
 ===== How to cite =====
-  * When referring to MEGA in the main text of your publication, you may choose a format such as:\\
+Citation for MEGA 5:
-    Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (Tamura, Dudley, Nei, and Kumar 2007).
+  * Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution doi: 10.1093/molbev/msr121.
-  * When including a MEGA citation in the Literature Cited/Bibliography section, you may use the following:\\
-Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24:1596-1599. (Publication PDF at http://www.kumarlab.net/publications)
+\\
+Citation for MEGA 4:
+  * Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599.