NEXUS: Description in Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997. NEXUS: an extensible file format for systematic information.
Systematic Biology 46:590-621.
NEXUS is a file format designed to contain systematic data for use by computer programs.
The goals of the format are to allow future expansion, to include diverse kinds of information,
to be independent of particular computer operating systems, and to be easily processed
by a program.
standard text file
[comment]
#NEXUS
command-name token token . . . ;
BEGIN block-name; command-name token . . . ; command-name token . . . ; ... END;
The primary public blocks are (commonly defined, []=optional, {||}=mutually exclusive options):
BEGIN TAXA; DIMENSIONS NTAX=number-of-taxa; TAXLABELS taxon-name [taxon-name ...] ; END;
BEGIN CHARACTERS; DIMENSIONS [NEWTAXA NTAX=number-of-taxa] NCHAR=number-of-characters; [FORMAT [DATATYPE={STANDARD|DNA|RNA|NUCLEOTIDE|PROTEIN|CONTINUOUS}] default: STANDARD [RESPECTCASE] default: A and a is the same [MISSING=symbol] default: ? [GAP=symbol] [SYMBOLS=”symbol [symbol...]”] [EQUATE=”symbol=entry [symbol=entry]”] [MATCHCHAR=symbol] [[No]LABELS] [TRANSPOSE] [INTERLEAVE] [ITEMS=([MIN][MAX][MEDIAN][AVERAGE][VARIANCE][STCERROR][SAMPLESIZE][STATES])] default: STATES [STATESFORMAT={STATESPRESENT|INDIVIDUALS|COUNT|FREQUENCY}] default: STATESPERSENT [[No]TOKENS] ;] [ELIMINATE character-set;] [TAXLABELS taxon-name [taxon-name...];] [CARSTATELABELS character-number [character-name] [/state-name [state-name...]] [, character-number [character-name] [/state-name [state-name...]] ...] ;] [CHARLABELS character-name [character-name...];] [STATELABELS character-number [character-name] [/state-name [state-name...]] [, character-number [character-name] [/state-name [state-name...]] ...] ;] MATRIX data-matrix; END;
example:
BEGIN CHARACTERS; DIMENSION NCHAR=3; CHARSTATELABELS 1 hair/absent present, 2 color/red blue, 3 size/small big; FORMAT TOKENS; MATRIX taxon_1 absent red big taxon_2 absent blue small taxon_3 present blue small; END;
BEGIN UNALIGNED; [DIMENSIONS NEWTAXA NTAX=number-of-taxa;] [FORMAT [DATATYPE={STANDARD|DNA|RNA|NUCLEOTIDE|PROTEIN}] [RESPECTCASE] [MISSING=symbol] [SYMPOLS=”symbol [symbol...]”] [EQUATE=”symbol=entry [symbol=entry...]”] [[No]LABELS] ;] [TAXLABELS taxon-name [taxon-name...];] MATRIX data-matrix; END;
example:
BEGIN UNALIGNED; FORMAT DATATYPE= DNA; MATRIX taxon-1 ACTAGGACTAGATCAAGTT, taxon-2 ACCAGGACTAGCGGATCAAG, taxon-3 ACCAGGACTAGATCAAG; END;
BEGIN DISTANCES; [DIMENSIONS [NEWTAXA] NTAX=number-of-taxa NCHAR=number-of-characters;] [FORMAT [TRIANGLE={LOWER|UPPER|BOTH}] [[NO]DIAGONAL] [[NO]LABELS] [MISSING=symbol] [INTERLEAVE] ;] [TAXLABELS taxon-name [taxon-name...];] [MATRIX distance-matirx; END;
example:
BEGIN DISTANCES; FORMAT TRIANGLE=UPPER; MATRIX taxon_1 0.0 1.0 2.0 taxon_2 0.0 3.0 taxon_3 0.0; END;
BEGIN DATA; DIMENSIONS NTAX=5 NCHAR=20; FORMAT DATATYPE=DNA GAP=-; MATRIX taxon-1 A-CTAGGACTA---GATCAA taxon-2 A-CCAGGACTAGCGGATCAA taxon-3 A-CCAGGACTA---GATCAA taxon-4 AGCCAGGACTA---GTTCAA taxon-5 ATC-AGGACTA---GATCAA; END;
BEGIN SETS; [CHARSET charstet_name [({STANDARD|VECTOR})]=character-set;] [STATESET stateset-name [({STANDARD|VECTOR})]=state-set;] [CHANGESET changeset-name=state-set<->state-set [state-set<->state-set...];] [TAXSET taxset-name [({STANDARD|VECTOR})]=taxon-set;] [TREESET treeset-name [({STANDARD|VECTOR})]=tree-set;] [CHARPARTITION partition-name [([{[NO]TOKENS}] [{STANDARD|VECTOR}])] =subset-name:character-set [, subset-name:character-set...];] [TAXPARTITION partition-name [([{[NO]TOKENS}] [{STANDARD|VECTOR}])] =subset-name:taxon-set [, subset-name:taxon-set...];] [TREEPARTITION partition-name [([{[NO]TOKENS}] [{STANDARD|VECTOR}])] =subset-name:tree-set [, subset-name:tree-set...];] END;
example:
BEGIN SETS; CHARSET larval=1-3 5-8; STATESET eyeless=0; STATESET eyed=1 2 3; CHANGESET eyeloss=eyed -> eyeless; TAXSET outgroup=1-4; TREESET AfrNZVicariance=3 5 9-12; CHARPARTITION bodyparts=head:1-4 7, body:5 6, legs:8-10; END;
BEGIN ASSUMPTIONS; [OPTIONS [DEFTYPE=type-name] [POLYTCOUNT={MINSTEPS|MAXSTEPS}] [GAPMODE={MISSING|NEWSTATE}];] [USERTYPE type-name[({STEPMATRIX|CSTREE})]=USERTYPE-description;] [TYPESET [*] typeset-name [({STANDARD|VECTOR})]=TYPESET-definition;] [WTSET [*] stset-name [({STANDARD|VECTOR} {TOKENS|NOTOKENS})]=WTSET-definition;] [EXSET [*] exset-name [({STANDARD|VECTOR})]=character-set;] [ANCSTATES [*] ancstates-name [({STANDARD|VECTOR} {TOKENS|NOTOKENS})]=ANCSTATES-definition;] END;
example:
BEGIN ASSUMPTIONS; OPTIONS DEFTYPE=ORD; USERTYPE myOrd=4 0 1 2 3 . 1 2 3 1 . 1 2 2 1 . 1 3 2 1 .; USERTYPE myTree (CSTREE)=((0,1)a, (2,3)b)c; TYPESET * mixed=IRREV: 1 3 10, UNORD 5-7; WTSET * one=2: 1-3 6 11-15, 3: 7 8; WTSET two=2:4 9, 3:1-3 5; EXSET nolarval=1-9; ANCSTATES mixed=0: 1 3 5-8 11; 1: 2 4 9-15; END;
BEGIN CODONS; [CODONPOSSET [*] name [({STANDARD|VECTOR})]= N: character-set, 1: character-set, 2: character-set, 3: character-set;] [GENETICCODE code-name [([CODEORDER=123|other] [NUCORDER=TCAG|other] [[NO]TOKENS] [EXTENSIONS=“symbols-list“])] =genetic code description];] [CODESET [*] codeset-name {(CHARACTERS|UNALIGNED|TAXA)} =code-name:character-set or taxon-set [,code-name:character-set or taxon-set...];] END;
BEGIN TREES; [TRANSLATE arbitrary-token-used-in-tree-description valid-taxon-name [, arbitrary-token-used-in-tree-description valid-taxon-name. . . ] ;] [TREE [*] tree-name= tree-specification;] END;
example:
BEGIN TAXA; TAXLABELS Scarabaeus Drosophila Aranaeus; END; BEGIN TREES; TRANSLATE beetle Scarabaeus, fly Drosophila, spider Aranaeus; TREEtreel=( (1,2), 3 ) ; TREEtree2= ( (beetle, fly), spider); TREEtree3= ( (Scarabaeus, Drosophila), Aranaeus); END;
BEGIN NOTES; [TEXT [TAXON=taxon-set] [CARACTER=character-set] [STATE=state-set] [TREE=tree-set] SOURCE={INLINE|FILE|RESOURCE}TEXT=text-or-source-description:] [PICTURE [TAXON=taxon-set] [CARACTER=character-set] [STATE=state-set] [TREE=tree-set] [FORMAT=[PICT|TIFF|EPS|JPEG|GIF}] [ENCODE={NONE|UUENCODE|BINHEX}] [SOURCE={INLINE|FILE|RESOURCE}PICTURE=picture-or-source-descriptior;] END;
CHARSET larval =1-15
#NEXUS BEGIN TAXA; Dimensions NTax=4; TaxLabels fish frog snake mouse; END; BEGIN CHARACTERS; Dimensions NChar=20; Format DataType=DNA; Matrix fish ACATA GAGGG TACCT CTAAG frog ACATA GAGGG TACCT CTAAG snake ACATA GAGGG TACCT CTAAG mouse ACATA GAGGG TACCT CTAAG END; BEGIN TREES; Tree best=(fish, (frog, (snake, mouse))); END;
Description in Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997. NEXUS: an extensible file format for systematic information.
Systematic Biology 46:590-621.
NEXUS Class Library (NCL) is an integrated collection of C++ classes designed to allow the user to quickly write a program that reads NEXUS-formatted data files. It also allows easy extension of the NEXUS format to include new blocks of your own design