Differences

This shows you the differences between two versions of the page.

--- meetings [2008/02/12 13:41] – heidi
+++ meetings [2008/10/20 15:42] (current) – heidi
@@ Line 5: / Line 5: @@
   * [[21.01.2008]]: file format
   * [[12.02.2008]]: file format
+  * [[03.03.2008]]: file format/ converter
+  * [[14.03.2008]]: Arlequin output
+  * [[17.03.2008]]: file format
+  * ...
+  * [[06.06.2008]]: converter
+  * [[23.07.2008]]: end of Master
+  * [[20.10.2008]]: last changes in R-lequin / writing
 ===== e-mails =====
-   * 10.02.2008:<code>
+  * [[10.02.08]]: Laurent: file format (Howard Cann)
-Dear Howard,
+  * [[01.05.08]]: Laurent: phyloXML
+  * [[21.05.08]]: Tinu: Converter von AFLPDat zu dem Selection Programm von Matthieu
-Howard Cann wrote:
+  * [[09.06.08]]: Laurent: Blackwell Synergy - Mol Ecol Res, Volume 8 Issue 3 Page 578-580, May 2008 (Article Abstract)
-> Dear Laurent,
->
-> Until now, the HGDP-CEPH diversity panel database has stored and
-> displayed marker genotypes generated on the panel population samples.
-It
-> is time that we receive sequences from panel users who are
-> resequencing in the panel in order to study human variation, estimate
-> diversity indices, describe human demography/history, etc.
-Sounds good!
-> What file formats should we be ready to receive?
-Some flat file format would be good to have, but I think there is no
-general agreement on how these large resequencing files should be
-formatted. I have a MSc student with whom we are beginning to think
-about such a format. We are investigating the possibility to have some
-xml coded file, that would be efficient for resequencing data, but we
-are still in the first development phase.
-> How should we suggest to contributors to code their sequence data (nt
-> letters or numbers), to code missing bases.......?  Etc.
-I guess it would be most useful if sequences would be grouped by
-population, with info on:
-Sequence region (chromosomic region)
-Sequence begin
-Sequence length
-Population where it was sequenced
-Individual in which it was sequenced
-Geographic coordinate of the population or of the individual
-Linguistic group or language family of the individual or population
-Tag indicating is sequence phase has been inferred, with pointer to the
-other complementary sequence
-Nucleotide should be coded as ACGT, and ? for missing data (which makes
-intuitive sense), otherwise common letters for ambiguous nucleotide
-assignment
-> What other questions should I be asking you in order to set up a
-> sequence db that will be useful to scientists in the field of human
-> population genetics.
-Some information on whether it is coding sequence or not, with the start
-of the coding region, would be nice to have. Some link to some other
-data base (e.g. ensembl) where additional information can be found would
-be nice as well.
-> I think that CEPH should be concerned with managing and maintaining
-> the sequences in the db and not with computing various parmeters of
-> polymorphism, diversity etc. from them, which most of th panel users
-> are capable of doing.
-Yes, you are right, but some summary statistics could be useful to
-compute.
-It would also be nice to be able to extract, say all sequences or
-polymorphism in a given chromosomal region.
-Cheers
-laurent
-</code>