User Tools

Site Tools


phylip

This is an old revision of the document!


PHYLIP


PHYLIP


Version 3.67 (July, 2007) PHYLIP, the Phylogeny Inference Package, is a package of programs for inferring phylogenies (evolutionary trees). It can infer phylogenies by parsimony, compatibility, distance matrix methods, and likelihood. It can also compute consensus trees, compute distances between trees, draw trees, resample data sets by bootstrapping or jackknifing, edit trees, and compute distance matrices.

Program information

  • written in C
  • Windows
  • Mac OS X
  • Mac OS 9
  • UNIX
  • Linux

Data type handled

  • nucleotide sequences
  • protein sequences
  • gene frequencies
  • restriction sites
  • restriction fragments
  • distances
  • discrete characters
  • continuous characters

Input Files

For most of the PHYLIP programs, information comes from a series of input files, and ends up in a series of output files:

                   -------------------
                  |                   |
infile ---------> |                   |
                  |                   |
intree ---------> |                   | -----------> outfile
                  |                   |
weights --------> |      program      | -----------> outtree
                  |                   |
categories -----> |                   | -----------> plotfile
                  |                   |
fontfile -------> |                   |
                  |                   |
                   -------------------

Input data such as DNA sequences comes from a file whose default name is infile. If the user supplies a tree, this is in a file whose default name is intree. Values of weights for the characters are in weights, and the tree plotting program need some digitized fonts which are supplied in fontfile (all these are default names).


  • first line: the number of species and the number of characters. These are in free format, separated by blanks
  • next lines: information for each species, starting with a ten-character species name (which can include blanks and some punctuation marks. The name should be ten characters in length, filled out to the full ten characters by blanks if shorter), and continuing with the characters for that species. The name should be on the same line as the first character of the data for that species.
  • In the discrete-character programs, DNA sequence programs and protein sequence programs the characters are each a single letter or digit, sometimes separated by blanks. In the continuous-characters programs they are real numbers with decimal points, separated by blanks:
    Latimeria 2.03 3.457 100.2 0.0 -3.7
  • The conventions about continuing the data beyond one line per species are different between the molecular sequence programs and the others. The molecular sequence programs can take the data in “aligned” or “interleaved” format, in which we first have some lines giving the first part of each of the sequences, then some lines giving the next part of each, and so on. Thus the sequences might look like this:
    6   39
Archaeopt CGATGCTTAC CGCCGATGCT
HesperorniCGTTACTCGT TGTCGTTACT
BaluchitheTAATGTTAAT TGTTAATGTT
B. virginiTAATGTTCGT TGTTAATGTT
BrontosaurCAAAACCCAT CATCAAAACC
B.subtilisGGCAGCCAAT CACGGCAGCC

TACCGCCGAT GCTTACCGC
CGTTGTCGTT ACTCGTTGT
AATTGTTAAT GTTAATTGT
CGTTGTTAAT GTTCGTTGT
CATCATCAAA ACCCATCAT
AATCACGGCA GCCAATCAC

example:

For the parsimony, compatibility and maximum likelihood programs, excluding the distance matrix methods, the simplest version of the input data file looks something like this:

   6   13
Archaeopt CGATGCTTAC CGC
HesperorniCGTTACTCGT TGT
BaluchitheTAATGTTAAT TGT
B. virginiTAATGTTCGT TGT
BrontosaurCAAAACCCAT CAT
B.subtilisGGCAGCCAAT CAC

How to cite

phylip.1196939069.txt.gz · Last modified: 2008/07/22 13:30 (external edit)