Differences

This shows you the differences between two versions of the page.

--- nexus [2007/12/12 08:45] – heidi
+++ nexus [2008/07/22 13:31] (current) – external edit 127.0.0.1
@@ Line 2: / Line 2: @@
 **{{nexus.pdf|NEXUS}}**: Description in Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997. NEXUS:  an extensible file format for systematic information.
 \\ Systematic Biology 46:590-621.
-http://wiki.christophchamp.com/index.php/NEXUS_file_format
 \\
@@ Line 12: / Line 10: @@
+===== format information =====
-===== Program information =====
 standard text file
-===== Data type handled =====
@@ Line 24: / Line 17: @@
+===== file format: =====
+  * NEXUS files are free-format, which means that the entire file could conceivably consist of a single, long line of text. It does not matter to Hickory where you break lines (as long as you don’t split up a keyword or the name of a locus, allele or population), nor does it matter if you use one space or a dozen spaces to separate the individual words (tokens) in the file. Tokens may be casually defined as sequences of characters separated by whitespace (e.g., spaces, carriage returns, line feeds, tabs, etc.)
+  * NEXUS files are for the most part not case-sensitive by default. A big exception is in the matrix command, where (by default) an allele named A is treated as being distinct from a
+\\
-===== Input Files =====
   * Comments can be added by enclosing text with brackets: ''[comment]''
   * first line must be: ''#NEXUS''
@@ Line 38: / Line 28: @@
     * Blocks: series of commands, beginning with a BEGIN command and ending with an END command: <code>
 BEGIN block-name;
-command-name token . . . ;
+ command-name token . . . ;
-command-name token . . . ;
+ command-name token . . . ;
-...
+ ...
 END;
 </code>
@@ Line 47: / Line 37: @@
   * **TAXA:** TAXA block defines taxa and gives them names. The block also establishes the order (numbering) of the taxa. Taxa consist of the entities (biological species, haplotypes, manuscripts, etc.) whose attributes might be recorded in characters and whose relationships are described in trees <code>
 BEGIN TAXA;
-DIMENSIONS NTAX=number-of-taxa;
+ DIMENSIONS NTAX=number-of-taxa;
-TAXLABELS taxon-name [taxon-name ...] ;
+ TAXLABELS taxon-name [taxon-name ...] ;
 END;
 </code>
@@ Line 65: / Line 55: @@
   [TRANSPOSE]
   [INTERLEAVE]
-  [ITEMS=([MIN][MAX][MEDIAN][AVERAGE][VARIANCE][STCERROR][SAMPLESIZE][STATES])]        default: STATES
+  [ITEMS=([MIN][MAX][MEDIAN][AVERAGE][VARIANCE][STCERROR][SAMPLESIZE][STATES])]  default: STATES
-  [STATESFORMAT={STATESPRESENT|INDIVIDUALS|COUNT|FREQUENCY}]                           default: STATESPERSENT
+  [STATESFORMAT={STATESPRESENT|INDIVIDUALS|COUNT|FREQUENCY}]                     default: STATESPERSENT
   [[No]TOKENS]
  ;]
@@ Line 81: / Line 71: @@
  ;]
  MATRIX data-matrix;
+END;
+</code> example: <code>
+BEGIN CHARACTERS;
+ DIMENSION NCHAR=3;
+ CHARSTATELABELS 1 hair/absent present, 2 color/red blue, 3 size/small big;
+ FORMAT TOKENS;
+ MATRIX
+  taxon_1 absent red big
+  taxon_2 absent blue small
+  taxon_3 present blue small;
 END;
 </code>
-  * UNALIGNED: similar to a CHARACTRS block, but it contains unaligned molecular sequence data. <code>
+  * **UNALIGNED:** similar to a CHARACTRS block, but it contains unaligned molecular sequence data. <code>
 BEGIN UNALIGNED;
  [DIMENSIONS NEWTAXA NTAX=number-of-taxa;]
@@ Line 108: / Line 108: @@
 </code>
-  * DISTANCES: contains distance matrices <code>
+  * **DISTANCES:** contains distance matrices <code>
 BEGIN DISTANCES;
  [DIMENSIONS [NEWTAXA] NTAX=number-of-taxa NCHAR=number-of-characters;]
@@ Line 131: / Line 131: @@
 </code>
-  * DATA:
+  * **DATA:** is a CHARACTERS block that includes not only the definition of characters but also the definition of taxa (this block is not recommended) <code>
+BEGIN DATA;
+ DIMENSIONS NTAX=5 NCHAR=20;
+ FORMAT DATATYPE=DNA GAP=-;
+ MATRIX
+  taxon-1 A-CTAGGACTA---GATCAA
+  taxon-2 A-CCAGGACTAGCGGATCAA
+  taxon-3 A-CCAGGACTA---GATCAA
+  taxon-4 AGCCAGGACTA---GTTCAA
+  taxon-5 ATC-AGGACTA---GATCAA;
+END;
+</code>
+  * **SETS:** descriptions of collections of objects. These objects include characters, taxa, trees, states, and kinds of changes. In addition, partitions of characters, taxa, and trees can be formed. <code>
+BEGIN SETS;
+ [CHARSET charstet_name [({STANDARD|VECTOR})]=character-set;]
+ [STATESET stateset-name [({STANDARD|VECTOR})]=state-set;]
+ [CHANGESET changeset-name=state-set<->state-set [state-set<->state-set...];]
+ [TAXSET taxset-name [({STANDARD|VECTOR})]=taxon-set;]
+ [TREESET treeset-name [({STANDARD|VECTOR})]=tree-set;]
+ [CHARPARTITION partition-name [([{[NO]TOKENS}] [{STANDARD|VECTOR}])]
+  =subset-name:character-set [, subset-name:character-set...];]
+ [TAXPARTITION partition-name [([{[NO]TOKENS}] [{STANDARD|VECTOR}])]
+  =subset-name:taxon-set [, subset-name:taxon-set...];]
+ [TREEPARTITION partition-name [([{[NO]TOKENS}] [{STANDARD|VECTOR}])]
+  =subset-name:tree-set [, subset-name:tree-set...];]
+END;
+</code> example: <code>
+BEGIN SETS;
+ CHARSET larval=1-3 5-8;
+ STATESET eyeless=0;
+ STATESET eyed=1 2 3;
+ CHANGESET eyeloss=eyed -> eyeless;
+ TAXSET outgroup=1-4;
+ TREESET AfrNZVicariance=3 5 9-12;
+ CHARPARTITION bodyparts=head:1-4 7, body:5 6, legs:8-10;
+END;
+</code>
-  * SETS: descriptions of collections of objects. These objects include characters, taxa, trees, states, and kinds of changes. In addition, partitions of characters, taxa, and trees can be formed.
+  * **ASSUMPTIONS:** assumptions about the data. These can include assignment of weights to various characters, specification of the nature of character changes, exclusion of particular characters, and designation of ancestral states. <code>
-  * ASSUMPTIONS: assumptions about the data. These can include assignment of weights to various characters, specification of the nature of character changes, exclusion of particular characters, and designation of ancestral states.
+BEGIN ASSUMPTIONS;
-  * CODONS: designates the sites in nucleotide sequence data that are protein coding and the position within a codon of each site and assigns a genetic code to molecular sequence data. Custom genetic codes can also be defined within this block.
+ [OPTIONS [DEFTYPE=type-name]
-  * **TREES**
+  [POLYTCOUNT={MINSTEPS|MAXSTEPS}]
-  * NOTES: allows attachment of additional information (text, pictures, etc.) to various objects (trees, taxa, characters, etc.) in the file.
+  [GAPMODE={MISSING|NEWSTATE}];]
+ [USERTYPE type-name[({STEPMATRIX|CSTREE})]=USERTYPE-description;]
+ [TYPESET [*] typeset-name [({STANDARD|VECTOR})]=TYPESET-definition;]
+ [WTSET [*] stset-name [({STANDARD|VECTOR} {TOKENS|NOTOKENS})]=WTSET-definition;]
+ [EXSET [*] exset-name [({STANDARD|VECTOR})]=character-set;]
+ [ANCSTATES [*] ancstates-name [({STANDARD|VECTOR} {TOKENS|NOTOKENS})]=ANCSTATES-definition;]
+END;
+</code> example: <code>
+BEGIN ASSUMPTIONS;
+ OPTIONS DEFTYPE=ORD;
+ USERTYPE myOrd=4
+1 2 3
+  . 1 2 3
+. 1 2
+1 . 1
+2 1 .;
+ USERTYPE myTree (CSTREE)=((0,1)a, (2,3)b)c;
+ TYPESET * mixed=IRREV: 1 3 10, UNORD 5-7;
+ WTSET * one=2: 1-3 6 11-15, 3: 7 8;
+ WTSET two=2:4 9, 3:1-3 5;
+ EXSET nolarval=1-9;
+ ANCSTATES mixed=0: 1 3 5-8 11; 1: 2 4 9-15;
+END;
+</code>
+  * **CODONS:** contains information about the genetic code, the regions of DNA and RNA sequences that are protein coding, and the location of triplets coding for amino acids in nucleotide sequences. <code>
+BEGIN CODONS;
+ [CODONPOSSET [*] name [({STANDARD|VECTOR})]=
+  N: character-set,
+: character-set,
+: character-set,
+: character-set;]
+ [GENETICCODE code-name
+  [([CODEORDER=123|other] [NUCORDER=TCAG|other] [[NO]TOKENS] [EXTENSIONS=“symbols-list“])]
+   =genetic code description];]
+ [CODESET [*] codeset-name {(CHARACTERS|UNALIGNED|TAXA)}
+   =code-name:character-set or taxon-set [,code-name:character-set or taxon-set...];]
+END;
+</code>
+  * **TREES:** stores information about trees <code>
+BEGIN TREES;
+ [TRANSLATE arbitrary-token-used-in-tree-description valid-taxon-name
+  [, arbitrary-token-used-in-tree-description valid-taxon-name. . . ] ;]
+ [TREE [*] tree-name= tree-specification;]
+END;
+</code> example: <code>
+BEGIN TAXA;
+ TAXLABELS Scarabaeus Drosophila Aranaeus;
+END;
+BEGIN TREES;
+ TRANSLATE beetle Scarabaeus, fly Drosophila, spider Aranaeus;
+ TREEtreel=( (1,2), 3 ) ;
+ TREEtree2= ( (beetle, fly), spider);
+ TREEtree3= ( (Scarabaeus, Drosophila), Aranaeus);
+END;
+</code>
+  * **NOTES:** allows attachment of additional information (text, pictures, etc.) to various objects (trees, taxa, characters, etc.) in the file. <code>
+BEGIN NOTES;
+ [TEXT [TAXON=taxon-set] [CARACTER=character-set] [STATE=state-set] [TREE=tree-set]
+  SOURCE={INLINE|FILE|RESOURCE}TEXT=text-or-source-description:]
+ [PICTURE [TAXON=taxon-set] [CARACTER=character-set] [STATE=state-set] [TREE=tree-set]
+  [FORMAT=[PICT|TIFF|EPS|JPEG|GIF}] [ENCODE={NONE|UUENCODE|BINHEX}]
+  [SOURCE={INLINE|FILE|RESOURCE}PICTURE=picture-or-source-descriptior;]
+END;
+</code>
 \\
@@ Line 149: / Line 252: @@
     * several of the object definitions follow a particular format: GENETICCODE, CODONPOSSET, CODESET, CHARSET, STATESET, CHANGESET, TAXSET, TREESET, CHARPARTITION, TAXPARTITION, TREEPARTITION, USERTYPE, WTSET, TYPESET, EXSET, ANCSTATES and TREE. Example: ''CHARSET larval =1-15''
+\\
 === example: ===
 <code>
@@ Line 173: / Line 277: @@
 ===== How to cite =====
+Description in Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997. NEXUS:  an extensible file format for systematic information.
+\\ Systematic Biology 46:590-621.
+===== NCL =====
+[[http://hydrodictyon.eeb.uconn.edu/ncl/|NCL]]
+NEXUS Class Library (NCL) is an integrated collection of C++ classes designed to allow the user to quickly write a program that reads NEXUS-formatted data files. It also allows easy extension of the NEXUS format to include new blocks of your own design