meeting at 14.12.07
design of a new data format to store population genetics data:
XML or XML like
not to complex that you can have a look on your data in the text format
e.g.:
header: Name, Population, number of loci, history of loci (IDs), (loci information), no of individuals/sequences, interleaved data,…
data: in a table like form with IDs, Frequency, Data (<ID>…</ID> <Freq>…</Freq> <Genot>…</Genot>)
also look at different databases/formats:
ENSEMBL (in
FASTA
format)
HapMap:
HapMap:ENCODE Data
HGDP
ENCODE (resequencing HapMap)
A Prototype Object Database for Mitochondrial DNA Variation -- Neigel and Leberg 95 (1): 85 -- Journal of Heredity
TraitMap: an XML-based genetic-map database combining multigenic loci and biomolecular networks -- Heida et al. 20 (Supplement 1): i152 -- Bioinformatics
The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet
SOLEXA
: information on how solexa data are handled and concatenated
FASTA
(It seems also that the FASTA format is pretty well spread and it would be worth having a look at)
R-lequin:
give information about how to structure and set XML tags in the output of Arlequin (good visualisation and data extraction to make graphics)
to do until January