User Tools

Site Tools


meetings

This is an old revision of the document!


Master thesis meetings

e-mails

  • 10.02.2008:
    Dear Howard,
    
    Howard Cann wrote:
    > Dear Laurent,
    >
    > Until now, the HGDP-CEPH diversity panel database has stored and
    > displayed marker genotypes generated on the panel population samples.
    It
    > is time that we receive sequences from panel users who are 
    > resequencing in the panel in order to study human variation, estimate 
    > diversity indices, describe human demography/history, etc. 
    
    Sounds good!
    
    > What file formats should we be ready to receive? 
    
    Some flat file format would be good to have, but I think there is no 
    general agreement on how these large resequencing files should be 
    formatted. I have a MSc student with whom we are beginning to think 
    about such a format. We are investigating the possibility to have some 
    xml coded file, that would be efficient for resequencing data, but we 
    are still in the first development phase.
    > How should we suggest to contributors to code their sequence data (nt 
    > letters or numbers), to code missing bases.......?  Etc. 
    I guess it would be most useful if sequences would be grouped by 
    population, with info on:
    
    Sequence region (chromosomic region)
    Sequence begin
    Sequence length
    Population where it was sequenced
    Individual in which it was sequenced
    Geographic coordinate of the population or of the individual
    Linguistic group or language family of the individual or population
    Tag indicating is sequence phase has been inferred, with pointer to the 
    other complementary sequence
    Nucleotide should be coded as ACGT, and ? for missing data (which makes 
    intuitive sense), otherwise common letters for ambiguous nucleotide 
    assignment
    
    > What other questions should I be asking you in order to set up a 
    > sequence db that will be useful to scientists in the field of human 
    > population genetics. 
    Some information on whether it is coding sequence or not, with the start
    
    of the coding region, would be nice to have. Some link to some other 
    data base (e.g. ensembl) where additional information can be found would
    
    be nice as well.
    > I think that CEPH should be concerned with managing and maintaining 
    > the sequences in the db and not with computing various parmeters of 
    > polymorphism, diversity etc. from them, which most of th panel users 
    > are capable of doing.
    Yes, you are right, but some summary statistics could be useful to
    compute.
    
    It would also be nice to be able to extract, say all sequences or 
    polymorphism in a given chromosomal region.
    
    Cheers
    
    laurent
  • 09.06.2008:
    Hi Heidi,
    
    please have a look at the following paper and program...
    
    http://www.blackwell-synergy.com/links/doi/10.1111/j.1471-8286.2007.02036.x
    
    It would be worth looking at...
    
    cheers
    laurent
meetings.1213081069.txt.gz · Last modified: 2008/07/22 13:30 (external edit)