/dockstore-tool-cms2

Primary LanguagePythonApache License 2.0Apache-2.0

CMS2 workflows

Workflows to run population genetics simulations and compute component scores for CMS2.

WDL workflow organization

The top-level workflow is cms2_main.wdl . It is really a wrapper workflow, meant to hide many internal details/parameters.

Docker images

The docker image that is used for component score computation tasks, is specified by docker/cms2-docker-component-stats/Dockerfile . It is automatically constructed by Travis CI on pushes to the repo cms2-docker-component-stats, and pushed to quay.io/ilya_broad/cms . Currently, after updating that, you have to manually update the docker runtime attribute in each task in task.wdl; this should be automated.

That docker image includes a customized version of selscan, which can save normalization stats computed from one set of data and then use them to normalize another set of data. (We compute normalization stats for neutral sims/regions, and then use that to normalize component stats for selection sims/regions.) Updating that currently requires tagging a new version at https://github.com/notestaff/selscan , then updating docker/cms2-docker-component-stats/bioconda-recipes/recipes/selscan/meta.yaml to use that tag. Then when a commit is pushed to cms2-docker-component-stats, Travis will build a new conda package for selscan and include it in the docker image.

Resources

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/20140625_related_individuals.txt Sample NA20318 was recorded in ‘integrated_call_samples_v2.20130502.ALL.ped’ as being unrelated and assigned family ID 2480a. Based on information from Coriell and IBD data, we believe that this sample is part of family 2480 and related to samples NA20317 and NA20319.

Ancestral allele (based on 1000 genomes reference data). The following comes from its original README file: ACTG - high-confidence call, ancestral state supproted by the other two sequences actg - low-confindence call, ancestral state supported by one sequence only N - failure, the ancestral state is not supported by any other sequence =- - the extant species contains an insertion at this postion . - no coverage in the alignment

genetic maps: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates/

recent genetic maps: https://advances.sciencemag.org/content/5/10/eaaw9206.full https://drive.google.com/file/d/17KWNaJQJuldfbL9zljFpqj5oPfUiJ0Nv/view?usp=sharing

which samples are in which pops; pedigree information: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_g1k.ped

populations and superpopulations: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/README_populations.md ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/20131219.populations.tsv ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/20131219.superpopulations.tsv

Neutral regions

http://nre.cb.bscb.cornell.edu/nre/run.html Neutral Region Explorer