User Tools

Site Tools


10.02.08

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
10.02.08 [2008/06/10 09:15] – created heidi10.02.08 [2008/07/22 13:31] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== e-mail: 10.02.08 ====== ====== e-mail: 10.02.08 ======
 +Dear Howard,\\
 +
 +Howard Cann wrote:
 +> Dear Laurent,
 +>
 +> Until now, the HGDP-CEPH diversity panel database has stored and
 +> displayed marker genotypes generated on the panel population samples.
 +> It is time that we receive sequences from panel users who are 
 +> resequencing in the panel in order to study human variation, estimate 
 +> diversity indices, describe human demography/history, etc. 
 +
 +Sounds good!
 +
 +> What file formats should we be ready to receive? 
 +
 +Some flat file format would be good to have, but I think there is no 
 +general agreement on how these large resequencing files should be 
 +formatted. I have a MSc student with whom we are beginning to think 
 +about such a format. We are investigating the possibility to have some 
 +xml coded file, that would be efficient for resequencing data, but we 
 +are still in the first development phase.
 +> How should we suggest to contributors to code their sequence data (nt 
 +> letters or numbers), to code missing bases.......?  Etc. 
 +I guess it would be most useful if sequences would be grouped by 
 +population, with info on:\\
 +
 +Sequence region (chromosomic region)
 +Sequence begin
 +Sequence length
 +Population where it was sequenced
 +Individual in which it was sequenced
 +Geographic coordinate of the population or of the individual
 +Linguistic group or language family of the individual or population
 +Tag indicating is sequence phase has been inferred, with pointer to the 
 +other complementary sequence
 +Nucleotide should be coded as ACGT, and ? for missing data (which makes 
 +intuitive sense), otherwise common letters for ambiguous nucleotide 
 +assignment
 +
 +> What other questions should I be asking you in order to set up a 
 +> sequence db that will be useful to scientists in the field of human 
 +> population genetics.
 + 
 +Some information on whether it is coding sequence or not, with the start
 +of the coding region, would be nice to have. Some link to some other 
 +data base (e.g. ensembl) where additional information can be found would
 +be nice as well.
 +
 +> I think that CEPH should be concerned with managing and maintaining 
 +> the sequences in the db and not with computing various parmeters of 
 +> polymorphism, diversity etc. from them, which most of th panel users 
 +> are capable of doing.
 +
 +Yes, you are right, but some summary statistics could be useful to
 +compute.\\
 +
 +It would also be nice to be able to extract, say all sequences or 
 +polymorphism in a given chromosomal region.\\
 +
 +Cheers
 +laurent
  
10.02.08.1213082158.txt.gz · Last modified: 2008/07/22 13:29 (external edit)