cellect-genes.snakefile error (issue with the cellect_magma subworkflow)
Opened this issue · 1 comments
pascaltimshel commented
Problem description
When running cellect-genes.snakefile
the get_corrected_pvals
does not get executed in cellect_magma
sub workflow. This leads to the error No rule to produce...resid_correct_all.gsa.genes.pvals.out
.
I speculate the error has to do with a config file being loaded again in the cellect_magma subworkflow.
Error message
(py3_PT) pnt@hydra:/opt/software/pnt/CELLECT> snakemake --use-conda -j -s cellect-genes.snakefile --configfile config-DUMMY.yml
Loaded config file from --configfile argument
Building DAG of jobs...
Executing subworkflow cellect_magma.
Loading default config file: config.yml
Building DAG of jobs...
MissingRuleException:
No rule to produce /XXX/CELLECT_out/CELLECT-MAGMA/precomputation/COPD_Sakornsakolpat2019/COPD_Sakornsakolpat2019.resid_correct_all.gsa.genes.pvals.out (if you use input functions make sure that they don't raise unexpected exceptions).
Temporary solution
The command snakemake --use-conda -j -s cellect-genes.snakefile --configfile config-DUMMY.yml
will run without error when the subworkflow cellect_magma have been out-commented from cellect-genes.snakefile (https://github.com/perslab/CELLECT/blob/master/cellect-genes.snakefile#L92-L94) and get_corrected_pvals
have been copied into the workflow:
# Outcomment
# subworkflow cellect_magma:
# snakefile:
# "cellect-magma.snakefile"
# Copy rule from cellect_magma
rule get_corrected_pvals:
'''
Calculate gene-based p-values (corrected) from the corrected Z-stats. These are used only for cellect-genes, while prioritizing is done with Z stats. (results would be identical of course)
'''
input:
"{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.out"
output:
"{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.pvals.out"
conda:
"envs/cellectpy3.yml"
params:
gwas_name = lambda wildcards: GWAS_SUMSTATS[wildcards.gwas]['id'],
base_output_dir = "{BASE_OUTPUT_DIR}"
script:
"scripts/convert_zstat_to_pval.py"
# Single line modification to this rule to remove dependency of cellect_magma
rule get_effector_genes:
'''
Extract top percentile of ESmu > 0.0 genes
Extract top n MAGMA genes
Take intersection
'''
input:
expand("{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.pvals.out", BASE_OUTPUT_DIR=BASE_OUTPUT_DIR,gwas=list(GWAS_SUMSTATS.keys()))
# cellect_magma(expand("{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.pvals.out", BASE_OUTPUT_DIR=BASE_OUTPUT_DIR,gwas=list(GWAS_SUMSTATS.keys())))
output:
"{CELLECT_GENES_OUTPUT_DIR}/out/{run_prefix}__{gwas}.effector_genes.txt"
conda:
"envs/cellectpy3.yml"
log:
"{CELLECT_GENES_OUTPUT_DIR}/logs/log.get_effector_genes_snake.{run_prefix}.{gwas}.txt"
params:
magma_output_dir = BASE_OUTPUT_DIR,
cellect_genes_output_dir = CELLECT_GENES_OUTPUT_DIR,
specificity_matrix_file = lambda wildcards: SPECIFICITY_INPUT[wildcards.run_prefix]['path'],
specificity_matrix_name = "{run_prefix}",
gwas_name = lambda wildcards: GWAS_SUMSTATS[wildcards.gwas]['id'],
n_genes_magma = config['N_GENES_MAGMA'],
percentile_cutoff_esmu = config['PERCENTILE_CUTOFF_ESMU']
script:
"scripts/get_effector_genes.py"
pascaltimshel commented
@JonThom or @Tobi1kenobi : do you have any ideas for how to solve this?