mega
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
mega [2008/05/19 16:25] – heidi | mega [2011/07/07 11:50] (current) – heidi | ||
---|---|---|---|
Line 8: | Line 8: | ||
\\ | \\ | ||
- | Version | + | Version |
MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. | MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. | ||
+ | |||
Line 15: | Line 16: | ||
===== Program information ===== | ===== Program information ===== | ||
- | * Windows | + | * Windows XP, Vista, 7 (with at least 64 MB of RAM, 20 MB of available hard disk space) |
* MEGA also can be run on other operating systems for which Windows emulators are available: | * MEGA also can be run on other operating systems for which Windows emulators are available: | ||
* Macintosh: Windows using VirtualPC | * Macintosh: Windows using VirtualPC | ||
* Sun Workstation: | * Sun Workstation: | ||
* Linux: Windows using VMWare | * Linux: Windows using VMWare | ||
+ | |||
Line 27: | Line 29: | ||
* RNA | * RNA | ||
* nucleotide | * nucleotide | ||
+ | * distance | ||
* (protein sequences) | * (protein sequences) | ||
+ | |||
===== Input Files ===== | ===== Input Files ===== | ||
* ASCII-text files | * ASCII-text files | ||
* extension: *.MEG | * extension: *.MEG | ||
+ | * Importing Data from Other Formats: | ||
+ | * CLUSTAL | ||
+ | * [[NEXUS]] | ||
+ | * [[PHYLIP]] (Interleaved/ | ||
+ | * GCG | ||
+ | * [[FASTA]] | ||
+ | * PIR | ||
+ | * NBRF | ||
+ | * MSF | ||
+ | * IG | ||
+ | * Internet (NCBI) XML format | ||
+ | |||
+ | |||
+ | |||
==== Common Features ==== | ==== Common Features ==== | ||
Line 52: | Line 70: | ||
!Description This is detailed information the data file; | !Description This is detailed information the data file; | ||
</ | </ | ||
- | * **Format statement**: | + | * **Format statement**: |
+ | * written after the Title or the Description statement | ||
* contains one or more command statements | * contains one or more command statements | ||
- | * A command statement contains a command and a valid setting keyword ('' | + | * A command statement contains a command and a valid setting keyword ('' |
\\ | \\ | ||
- | * Comments: | + | * Comments: |
- | * keywords: | + | * anywhere in the data file |
- | * Rules for Taxa Names: Distance matrices as well as sequence data may come from species, populations, | + | * can span multiple lines |
- | * ‘#’ Sign: Every Iabel must be written on a new line, and a '#' | + | * enclosed in square brackets ([and]) |
- | * Characters: Taxa labels | + | * can be nested |
+ | * keywords: | ||
+ | * written in any combination of lower- and upper-case letters | ||
+ | * Taxa Names: | ||
+ | * ‘#’ Sign: Every Iabel must be written on a new line, and a '#' | ||
+ | * no restrictions on the length of the Iabels | ||
+ | * not required to be unique | ||
+ | * must start with alphanumeric characters (0-9, a-z, and A-Z) or a special character: '' | ||
+ | * After the first character, taxa labels may contain the following additional special characters:'' | ||
+ | * For multiple word labels, an underscore can be used to represent a blank space | ||
+ | \\ | ||
+ | |||
+ | |||
+ | |||
+ | ==== Sequence Input Data ==== | ||
+ | * must consist of two or more sequences of equal length | ||
+ | * sequences must be aligned | ||
+ | * written in IUPAC single-letter codes | ||
+ | * Sequences can be written in any combination of upper- and lower-case letters | ||
+ | * spaces and tabs are ignored | ||
+ | * generally used special symbols : period (.) -> identical sites, dash (-) -> alignment gaps, question mark (?) -> missing data | ||
+ | |||
+ | \\ | ||
+ | * **Keywords for Format Statement: | ||
+ | |||
+ | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
+ | | DataType | DNA, RNA, nucleotide, protein | | DataType=DNA | | ||
+ | | NSeqs | integer | Number of sequences | NSeqs=85 | | ||
+ | | NTaxa | integer | Synonymous with NSeqs | NTaxa=85 | | ||
+ | | NSites | integer | Number of nucleotides |Nsites=4592 | | ||
+ | | Property | Exon, Intron, Coding, Noncoding, and End | Specifies whether a domain is protein coding. Exon and Coding are synonymous, as are Intron and Noncoding. End specifies that the domain with the given name ends at this point | Property=cyt_b | | ||
+ | | Indel | single character | dash (-) to identify insertion/ | ||
+ | | Identical | single character | use period (.) to show identity with the first sequence | Identical = . | | ||
+ | | MatchChar | single character | Synonymous with the identical keyword | MatchChar = . | | ||
+ | | Missing | single character | use question mark (?) to indicate missing data | Missing = ? | | ||
+ | | CodeTable | A name | This instruction gives the name of the code table for the protein coding domains of the data | CodeTable = Standard | | ||
+ | |||
+ | |||
+ | * **Defining Genes and Domains:** | ||
+ | * attributes of different sites (and groups of sites, termed domains) are specified within the data "on the spot" rather than in an attributes block before or after the actual data. | ||
+ | | ||
+ | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
+ | | Domain | A name | defines a domain with the given name | Domain=first_exon | | ||
+ | | Gene | A name | defines a gene with the given name | Gene=cytb | | ||
+ | | Property | Exon, Intron, Coding, Noncoding, and End | specifies the protein-coding attribute for a domain | Property=cytb | | ||
+ | | CodonStart | A number | specifies the site where the next 1st-codon position will be found in a protein-coding domain | CodonStart=2 | | ||
+ | |||
+ | * **Defining Groups:** | ||
+ | * assign different taxa to groups in a sequence as well as to distance data files. | ||
+ | * the name of the group is written in a set of curly brackets ({}) following the taxa name. The group name can be attached to the taxa name using an underscore or just can be appended. | ||
+ | * there should be no spaces between the taxa name and group name | ||
+ | |||
+ | * **Labelling Individual Sites:** | ||
+ | * The individual sites in nucleotide or amino acid data can be labeled to construct non-contiguous sets of sites. | ||
+ | * Each site can be associated with only one label | ||
+ | * A label can be a letter or a number. | ||
=== example === | === example === | ||
+ | < | ||
+ | !Gene=FirstGene Domain=Exon1 Property=Coding; | ||
+ | # | ||
+ | # | ||
+ | # | ||
+ | |||
+ | !Gene=SecondGene Domain=Intron Property=Noncoding; | ||
+ | #Human ATTCCCAGGGAATTCCCGGGGGGTTTAAGGCCCCTTTAAAGAAAGAT | ||
+ | #Mouse GTAGCGCGCGTCGTCAGAGCTCCCAAGGGTAGCAGTCACAGAAAGAT | ||
+ | #Chicken GTAAAAAAAAAAGTCAGAGCTCCCCCCAATATATATCACAGAAAGAT | ||
+ | |||
+ | !Gene=ThirdGene Domain=Exon2 Property=Coding; | ||
+ | #Human ATCTGCTCTCGAGTACTGATACAAATGACTTCTGCGTACAACTGA | ||
+ | #Mouse ATCTGATCTCGTGTGCTGGTACGAATGATTTCTGCGTTCAACTGA | ||
+ | #Chicken ATCTGCTCTCGAGTACTGCTACCAATGACTTCTGCGTACAACTGA | ||
+ | !Label +++__-+++-a-+++-L-+++-k-+++123+++-_-+++---+++; | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Distance Input Data ==== | ||
+ | * in the lower-left or in the upper-right triangular matrix | ||
+ | * After writing the # | ||
+ | * Taxa names are followed by the distance matrix | ||
+ | |||
+ | \\ | ||
+ | * **Keywords for Format Statement: | ||
+ | |||
+ | ^ Command ^ Setting ^ Remark ^ Example ^ | ||
+ | | DataType | Distance | Specifies that the distance data is in the file | DataType=distance | | ||
+ | | NSeqs | integer | Number of sequences | NSeqs=85 | | ||
+ | | NTaxa | integer | Same as NSeqs | NTaxa=85 | | ||
+ | | DataFormat | Lowerleft, upperright | Specifies whether the data is in lower left triangular matrix or the upper right triangular matrix | DataFormat=lowerleft | | ||
+ | |||
+ | |||
+ | * **Defining Groups:** | ||
+ | * see above | ||
+ | |||
+ | === example === | ||
+ | < | ||
+ | #mega | ||
+ | !Title: Concatenated Files; | ||
+ | !Format DataType=Distance DataFormat=LowerLeft NTaxa=6; | ||
+ | |||
+ | #Rodent | ||
+ | #Primate | ||
+ | #Lagomorpha | ||
+ | # | ||
+ | #Carnivora | ||
+ | # | ||
+ | | ||
+ | 0.514 | ||
+ | 0.535 0.436 | ||
+ | 0.530 0.388 0.418 | ||
+ | 0.521 0.353 0.417 0.345 | ||
+ | 0.500 0.331 0.402 0.327 0.349 | ||
+ | </ | ||
+ | |||
+ | |||
===== How to cite ===== | ===== How to cite ===== | ||
- | * When referring to MEGA in the main text of your publication, | + | Citation for MEGA 5: |
- | Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (Tamura, Dudley, Nei, and Kumar 2007). | + | * Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis |
- | * When including a MEGA citation in the Literature Cited/ | + | |
- | Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis | + | |
+ | \\ | ||
+ | Citation for MEGA 4: | ||
+ | * Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599. |
mega.1211207107.txt.gz · Last modified: 2008/07/22 13:30 (external edit)