mega
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
mega [2008/05/16 16:33] – heidi | mega [2011/07/07 11:50] (current) – heidi | ||
---|---|---|---|
Line 8: | Line 8: | ||
\\ | \\ | ||
- | Version | + | Version |
MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. | MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. | ||
+ | |||
Line 15: | Line 16: | ||
===== Program information ===== | ===== Program information ===== | ||
- | * Windows | + | * Windows XP, Vista, 7 (with at least 64 MB of RAM, 20 MB of available hard disk space) |
* MEGA also can be run on other operating systems for which Windows emulators are available: | * MEGA also can be run on other operating systems for which Windows emulators are available: | ||
* Macintosh: Windows using VirtualPC | * Macintosh: Windows using VirtualPC | ||
* Sun Workstation: | * Sun Workstation: | ||
* Linux: Windows using VMWare | * Linux: Windows using VMWare | ||
+ | |||
Line 27: | Line 29: | ||
* RNA | * RNA | ||
* nucleotide | * nucleotide | ||
+ | * distance | ||
* (protein sequences) | * (protein sequences) | ||
+ | |||
===== Input Files ===== | ===== Input Files ===== | ||
+ | * ASCII-text files | ||
+ | * extension: *.MEG | ||
+ | * Importing Data from Other Formats: | ||
+ | * CLUSTAL | ||
+ | * [[NEXUS]] | ||
+ | * [[PHYLIP]] (Interleaved/ | ||
+ | * GCG | ||
+ | * [[FASTA]] | ||
+ | * PIR | ||
+ | * NBRF | ||
+ | * MSF | ||
+ | * IG | ||
+ | * Internet (NCBI) XML format | ||
+ | |||
+ | |||
+ | |||
+ | ==== Common Features ==== | ||
+ | * first line: must contain the keyword #MEGA | ||
+ | * second line: data file may contain a succinct description of the data (called Title). The **Title statement** is written according to a set of rules: | ||
+ | * always begins with '' | ||
+ | * not occupy more than one line of text | ||
+ | * must not contain a semicolon inside the statement | ||
+ | * example: < | ||
+ | #mega | ||
+ | !Title This is an example title; | ||
+ | </ | ||
+ | * third line: **Description statement**: | ||
+ | * always begins with '' | ||
+ | * may occupy multiple lines | ||
+ | * must not contain a semicolon inside the statement | ||
+ | * example: < | ||
+ | #mega | ||
+ | !Title This is an example title; | ||
+ | !Description This is detailed information the data file; | ||
+ | </ | ||
+ | * **Format statement**: | ||
+ | * written after the Title or the Description statement | ||
+ | * contains one or more command statements | ||
+ | * A command statement contains a command and a valid setting keyword ('' | ||
+ | |||
+ | \\ | ||
+ | * Comments: | ||
+ | * anywhere in the data file | ||
+ | * can span multiple lines | ||
+ | * enclosed in square brackets ([and]) | ||
+ | * can be nested | ||
+ | * keywords: | ||
+ | * written in any combination of lower- and upper-case letters | ||
+ | * Taxa Names: | ||
+ | * ‘#’ Sign: Every Iabel must be written on a new line, and a '#' | ||
+ | * no restrictions on the length of the Iabels | ||
+ | * not required to be unique (although identical labels may result in ambiguities and should be avoided) | ||
+ | * must start with alphanumeric characters (0-9, a-z, and A-Z) or a special character: '' | ||
+ | * After the first character, taxa labels may contain the following additional special characters:'' | ||
+ | * For multiple word labels, an underscore can be used to represent a blank space | ||
+ | |||
+ | \\ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Sequence Input Data ==== | ||
+ | * must consist of two or more sequences of equal length | ||
+ | * sequences must be aligned | ||
+ | * written in IUPAC single-letter codes | ||
+ | * Sequences can be written in any combination of upper- and lower-case letters | ||
+ | * spaces and tabs are ignored | ||
+ | * generally used special symbols : period (.) -> identical sites, dash (-) -> alignment gaps, question mark (?) -> missing data | ||
+ | |||
+ | \\ | ||
+ | * **Keywords for Format Statement: | ||
+ | |||
+ | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
+ | | DataType | DNA, RNA, nucleotide, protein | | DataType=DNA | | ||
+ | | NSeqs | integer | Number of sequences | NSeqs=85 | | ||
+ | | NTaxa | integer | Synonymous with NSeqs | NTaxa=85 | | ||
+ | | NSites | integer | Number of nucleotides |Nsites=4592 | | ||
+ | | Property | Exon, Intron, Coding, Noncoding, and End | Specifies whether a domain is protein coding. Exon and Coding are synonymous, as are Intron and Noncoding. End specifies that the domain with the given name ends at this point | Property=cyt_b | | ||
+ | | Indel | single character | dash (-) to identify insertion/ | ||
+ | | Identical | single character | use period (.) to show identity with the first sequence | Identical = . | | ||
+ | | MatchChar | single character | Synonymous with the identical keyword | MatchChar = . | | ||
+ | | Missing | single character | use question mark (?) to indicate missing data | Missing = ? | | ||
+ | | CodeTable | A name | This instruction gives the name of the code table for the protein coding domains of the data | CodeTable = Standard | | ||
+ | |||
+ | |||
+ | * **Defining Genes and Domains:** | ||
+ | * attributes of different sites (and groups of sites, termed domains) are specified within the data "on the spot" rather than in an attributes block before or after the actual data. | ||
+ | | ||
+ | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
+ | | Domain | A name | defines a domain with the given name | Domain=first_exon | | ||
+ | | Gene | A name | defines a gene with the given name | Gene=cytb | | ||
+ | | Property | Exon, Intron, Coding, Noncoding, and End | specifies the protein-coding attribute for a domain | Property=cytb | | ||
+ | | CodonStart | A number | specifies the site where the next 1st-codon position will be found in a protein-coding domain | CodonStart=2 | | ||
+ | |||
+ | * **Defining Groups:** | ||
+ | * assign different taxa to groups in a sequence as well as to distance data files. | ||
+ | * the name of the group is written in a set of curly brackets ({}) following the taxa name. The group name can be attached to the taxa name using an underscore or just can be appended. | ||
+ | * there should be no spaces between the taxa name and group name | ||
+ | |||
+ | * **Labelling Individual Sites:** | ||
+ | * The individual sites in nucleotide or amino acid data can be labeled to construct non-contiguous sets of sites. | ||
+ | * Each site can be associated with only one label | ||
+ | * A label can be a letter or a number. | ||
=== example === | === example === | ||
+ | < | ||
+ | !Gene=FirstGene Domain=Exon1 Property=Coding; | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | |||
+ | !Gene=SecondGene Domain=Intron Property=Noncoding; | ||
+ | #Human ATTCCCAGGGAATTCCCGGGGGGTTTAAGGCCCCTTTAAAGAAAGAT | ||
+ | #Mouse GTAGCGCGCGTCGTCAGAGCTCCCAAGGGTAGCAGTCACAGAAAGAT | ||
+ | #Chicken GTAAAAAAAAAAGTCAGAGCTCCCCCCAATATATATCACAGAAAGAT | ||
+ | |||
+ | !Gene=ThirdGene Domain=Exon2 Property=Coding; | ||
+ | #Human ATCTGCTCTCGAGTACTGATACAAATGACTTCTGCGTACAACTGA | ||
+ | #Mouse ATCTGATCTCGTGTGCTGGTACGAATGATTTCTGCGTTCAACTGA | ||
+ | #Chicken ATCTGCTCTCGAGTACTGCTACCAATGACTTCTGCGTACAACTGA | ||
+ | !Label +++__-+++-a-+++-L-+++-k-+++123+++-_-+++---+++; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Distance Input Data ==== | ||
+ | * in the lower-left or in the upper-right triangular matrix | ||
+ | * After writing the # | ||
+ | * Taxa names are followed by the distance matrix | ||
+ | |||
+ | \\ | ||
+ | * **Keywords for Format Statement: | ||
+ | |||
+ | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
+ | | DataType | Distance | Specifies that the distance data is in the file | DataType=distance | | ||
+ | | NSeqs | integer | Number of sequences | NSeqs=85 | | ||
+ | | NTaxa | integer | Same as NSeqs | NTaxa=85 | | ||
+ | | DataFormat | Lowerleft, upperright | Specifies whether the data is in lower left triangular matrix or the upper right triangular matrix | DataFormat=lowerleft | | ||
+ | |||
+ | |||
+ | * **Defining Groups:** | ||
+ | * see above | ||
+ | |||
+ | === example === | ||
+ | < | ||
+ | #mega | ||
+ | !Title: Concatenated Files; | ||
+ | !Format DataType=Distance DataFormat=LowerLeft NTaxa=6; | ||
+ | |||
+ | #Rodent | ||
+ | #Primate | ||
+ | #Lagomorpha | ||
+ | # | ||
+ | #Carnivora | ||
+ | # | ||
+ | | ||
+ | 0.514 | ||
+ | 0.535 0.436 | ||
+ | 0.530 0.388 0.418 | ||
+ | 0.521 0.353 0.417 0.345 | ||
+ | 0.500 0.331 0.402 0.327 0.349 | ||
+ | </ | ||
+ | |||
+ | |||
===== How to cite ===== | ===== How to cite ===== | ||
- | * When referring to MEGA in the main text of your publication, | + | Citation for MEGA 5: |
- | Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (Tamura, Dudley, Nei, and Kumar 2007). | + | * Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis |
- | * When including a MEGA citation in the Literature Cited/ | + | |
- | Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis | + | |
+ | \\ | ||
+ | Citation for MEGA 4: | ||
+ | * Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599. |
mega.1210948419.txt.gz · Last modified: 2008/07/22 13:30 (external edit)