mega
                Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| mega [2008/05/16 15:55] – heidi | mega [2011/07/07 11:50] (current) – heidi | ||
|---|---|---|---|
| Line 8: | Line 8: | ||
| \\ | \\ | ||
| - | Version | + | Version | 
| MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. | MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. | ||
| + | |||
| Line 15: | Line 16: | ||
| ===== Program information ===== | ===== Program information ===== | ||
| - | * Windows | + | * Windows XP, Vista, 7 (with at least 64 MB of RAM, 20 MB of available hard disk space) | 
| * MEGA also can be run on other operating systems for which Windows emulators are available: | * MEGA also can be run on other operating systems for which Windows emulators are available: | ||
| * Macintosh: Windows using VirtualPC | * Macintosh: Windows using VirtualPC | ||
| * Sun Workstation: | * Sun Workstation: | ||
| * Linux: Windows using VMWare | * Linux: Windows using VMWare | ||
| + | |||
| + | |||
| ===== Data type handled ===== | ===== Data type handled ===== | ||
| * DNA | * DNA | ||
| + | * RNA | ||
| + | * nucleotide | ||
| + | * distance | ||
| * (protein sequences) | * (protein sequences) | ||
| + | |||
| ===== Input Files ===== | ===== Input Files ===== | ||
| + | * ASCII-text files | ||
| + | * extension: *.MEG | ||
| + | * Importing Data from Other Formats: | ||
| + | * CLUSTAL | ||
| + | * [[NEXUS]] | ||
| + | * [[PHYLIP]] (Interleaved/ | ||
| + | * GCG | ||
| + | * [[FASTA]] | ||
| + | * PIR | ||
| + | * NBRF | ||
| + | * MSF | ||
| + | * IG | ||
| + | * Internet (NCBI) XML format | ||
| + | |||
| + | |||
| + | |||
| + | ==== Common Features ==== | ||
| + | * first line: must contain the keyword #MEGA | ||
| + | * second line: data file may contain a succinct description of the data (called Title). The **Title statement** is written according to a set of rules: | ||
| + | * always begins with '' | ||
| + | * not occupy more than one line of text | ||
| + | * must not contain a semicolon inside the statement | ||
| + | * example: < | ||
| + | #mega | ||
| + | !Title This is an example title; | ||
| + | </ | ||
| + | * third line: **Description statement**: | ||
| + | * always begins with '' | ||
| + | * may occupy multiple lines | ||
| + | * must not contain a semicolon inside the statement | ||
| + | * example: < | ||
| + | #mega | ||
| + | !Title This is an example title; | ||
| + | !Description This is detailed information the data file; | ||
| + | </ | ||
| + | * **Format statement**: | ||
| + | * written after the Title or the Description statement | ||
| + | * contains one or more command statements | ||
| + | * A command statement contains a command and a valid setting keyword ('' | ||
| + | |||
| + | \\ | ||
| + | * Comments: | ||
| + | * anywhere in the data file | ||
| + | * can span multiple lines | ||
| + | * enclosed in square brackets ([and]) | ||
| + | * can be nested | ||
| + | * keywords: | ||
| + | * written in any combination of lower- and upper-case letters | ||
| + | * Taxa Names: | ||
| + | * ‘#’ Sign: Every Iabel must be written on a new line, and a '#' | ||
| + | * no restrictions on the length of the Iabels | ||
| + | * not required to be unique (although identical labels may result in ambiguities and should be avoided) | ||
| + | * must start with alphanumeric characters (0-9, a-z, and A-Z) or a special character: '' | ||
| + | * After the first character, taxa labels may contain the following additional special characters:'' | ||
| + | * For multiple word labels, an underscore can be used to represent a blank space | ||
| + | |||
| + | \\ | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ==== Sequence Input Data ==== | ||
| + | * must consist of two or more sequences of equal length | ||
| + | * sequences must be aligned | ||
| + | * written in IUPAC single-letter codes | ||
| + | * Sequences can be written in any combination of upper- and lower-case letters | ||
| + | * spaces and tabs are ignored | ||
| + | * generally used special symbols : period (.) -> identical sites, dash (-) -> alignment gaps, question mark (?) -> missing data | ||
| + | |||
| + | \\ | ||
| + | * **Keywords for Format Statement: | ||
| + | |||
| + | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
| + | | DataType | DNA, RNA, nucleotide, protein | | DataType=DNA | | ||
| + | | NSeqs | integer | Number of sequences | NSeqs=85 | | ||
| + | | NTaxa | integer | Synonymous with NSeqs | NTaxa=85 | | ||
| + | | NSites | integer | Number of nucleotides |Nsites=4592 | | ||
| + | | Property | Exon, Intron, Coding, Noncoding, and End | Specifies whether a domain is protein coding. Exon and Coding are synonymous, as are Intron and Noncoding. End specifies that the domain with the given name ends at this point | Property=cyt_b | | ||
| + | | Indel | single character | dash (-) to identify insertion/ | ||
| + | | Identical | single character | use period (.) to show identity with the first sequence | Identical = . | | ||
| + | | MatchChar | single character | Synonymous with the identical keyword | MatchChar = . | | ||
| + | | Missing | single character | use question mark (?) to indicate missing data | Missing = ? | | ||
| + | | CodeTable | A name | This instruction gives the name of the code table for the protein coding domains of the data | CodeTable = Standard | | ||
| + | |||
| + | |||
| + | * **Defining Genes and Domains:** | ||
| + | * attributes of different sites (and groups of sites, termed domains) are specified within the data "on the spot" rather than in an attributes block before or after the actual data. | ||
| + |  | ||
| + | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
| + | | Domain | A name | defines a domain with the given name | Domain=first_exon | | ||
| + | | Gene | A name | defines a gene with the given name | Gene=cytb | | ||
| + | | Property | Exon, Intron, Coding, Noncoding, and End | specifies the protein-coding attribute for a domain | Property=cytb | | ||
| + | | CodonStart | A number | specifies the site where the next 1st-codon position will be found in a protein-coding domain | CodonStart=2 | | ||
| + | |||
| + | * **Defining Groups:** | ||
| + | * assign different taxa to groups in a sequence as well as to distance data files. | ||
| + | * the name of the group is written in a set of curly brackets ({}) following the taxa name. The group name can be attached to the taxa name using an underscore or just can be appended. | ||
| + | * there should be no spaces between the taxa name and group name | ||
| + | |||
| + | * **Labelling Individual Sites:** | ||
| + | * The individual sites in nucleotide or amino acid data can be labeled to construct non-contiguous sets of sites. | ||
| + | * Each site can be associated with only one label | ||
| + | * A label can be a letter or a number. | ||
| === example === | === example === | ||
| + | < | ||
| + | !Gene=FirstGene Domain=Exon1 Property=Coding; | ||
| + | # | ||
| + | # | ||
| + | # | ||
| + | |||
| + | !Gene=SecondGene Domain=Intron Property=Noncoding; | ||
| + | #Human ATTCCCAGGGAATTCCCGGGGGGTTTAAGGCCCCTTTAAAGAAAGAT | ||
| + | #Mouse GTAGCGCGCGTCGTCAGAGCTCCCAAGGGTAGCAGTCACAGAAAGAT | ||
| + | #Chicken GTAAAAAAAAAAGTCAGAGCTCCCCCCAATATATATCACAGAAAGAT | ||
| + | |||
| + | !Gene=ThirdGene Domain=Exon2 Property=Coding; | ||
| + | #Human ATCTGCTCTCGAGTACTGATACAAATGACTTCTGCGTACAACTGA | ||
| + | #Mouse ATCTGATCTCGTGTGCTGGTACGAATGATTTCTGCGTTCAACTGA | ||
| + | #Chicken ATCTGCTCTCGAGTACTGCTACCAATGACTTCTGCGTACAACTGA | ||
| + | !Label +++__-+++-a-+++-L-+++-k-+++123+++-_-+++---+++; | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ==== Distance Input Data ==== | ||
| + | * in the lower-left or in the upper-right triangular matrix | ||
| + | * After writing the # | ||
| + | * Taxa names are followed by the distance matrix | ||
| + | |||
| + | \\ | ||
| + | * **Keywords for Format Statement: | ||
| + | |||
| + | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
| + | | DataType | Distance | Specifies that the distance data is in the file | DataType=distance | | ||
| + | | NSeqs | integer | Number of sequences | NSeqs=85 | | ||
| + | | NTaxa | integer | Same as NSeqs | NTaxa=85 | | ||
| + | | DataFormat | Lowerleft, upperright | Specifies whether the data is in lower left triangular matrix or the upper right triangular matrix | DataFormat=lowerleft | | ||
| + | |||
| + | |||
| + | * **Defining Groups:** | ||
| + | * see above | ||
| + | |||
| + | === example === | ||
| + | < | ||
| + | #mega | ||
| + | !Title: Concatenated Files; | ||
| + | !Format DataType=Distance DataFormat=LowerLeft NTaxa=6; | ||
| + | |||
| + | #Rodent | ||
| + | #Primate | ||
| + | #Lagomorpha | ||
| + | # | ||
| + | #Carnivora | ||
| + | # | ||
| + |  | ||
| + | 0.514 | ||
| + | 0.535 0.436 | ||
| + | 0.530 0.388 0.418 | ||
| + | 0.521 0.353 0.417 0.345 | ||
| + | 0.500 0.331 0.402 0.327 0.349 | ||
| + | </ | ||
| + | |||
| + | |||
| ===== How to cite ===== | ===== How to cite ===== | ||
| - | * When referring to MEGA in the main text of your publication, | + | Citation for MEGA 5: | 
| - | Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (Tamura, Dudley, Nei, and Kumar 2007). | + | * Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis | 
| - | * When including a MEGA citation in the Literature Cited/ | + | |
| - | Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis | + | |
| + | \\ | ||
| + | Citation for MEGA 4: | ||
| + | * Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599. | ||
mega.1210946159.txt.gz · Last modified: 2008/07/22 13:30 (external edit)
                
                