Project 1: needlestack variant calling

Question

Project 1: needlestack variant calling

tdelhomme opened this issue 7 years ago · 4 comments

Run needlestack on TCGA data.
Given a cohort, or a center, run on one gene, and then on a whole BED file with parallelization.

Project source code and documentation is hosted here.

Answer 1 · 2018-02-26T10:55:04.000Z

Todo list:

Create docker file
Run needlestack without Nextflow: bash script needlestack.sh
Run needlestack on tumor-normal pairs: create a txt file containing the tumor normal pairs (use bam files metadata to retrieve TCGA barcodes)
Parallelization: create a bed file and a script to merge the vcf files

Answer 2 · 2018-02-26T12:33:05.000Z

Maybe the needlestack dockerfile on dockerhub is ok for the bash version, need to ne checked.

Answer 3 · 2018-03-02T09:05:43.000Z

We created a new docker file in needlestack/dev/bin based on the needlestack dockerfile adding wget of the R scripts dependencies and the hg19/38 chromosomeNames2UCSC.txt

Answer 4 · 2018-03-05T09:16:16.000Z

To parallelize needlestack in a single task we can maybe use the scatter option which is different from the batch mode: https://docs.cancergenomicscloud.org/docs/about-parallelizing-tool-executions
https://docs.cancergenomicscloud.org/v1.0/blog/making-efficient-use-of-compute-resources#section-when-being-scattered-is-a-very-good-thing-optimising-a-whole-genome-analysis