perslab/CELLECT

cellect-genes.snakefile error (issue with the cellect_magma subworkflow)

Opened this issue · 1 comments

Problem description

When running cellect-genes.snakefile the get_corrected_pvals does not get executed in cellect_magma sub workflow. This leads to the error No rule to produce...resid_correct_all.gsa.genes.pvals.out.

I speculate the error has to do with a config file being loaded again in the cellect_magma subworkflow.

Error message

(py3_PT) pnt@hydra:/opt/software/pnt/CELLECT> snakemake --use-conda -j -s cellect-genes.snakefile --configfile config-DUMMY.yml
Loaded config file from --configfile argument
Building DAG of jobs...
Executing subworkflow cellect_magma.
Loading default config file: config.yml
Building DAG of jobs...
MissingRuleException:
No rule to produce /XXX/CELLECT_out/CELLECT-MAGMA/precomputation/COPD_Sakornsakolpat2019/COPD_Sakornsakolpat2019.resid_correct_all.gsa.genes.pvals.out (if you use input functions make sure that they don't raise unexpected exceptions).

Temporary solution

The command snakemake --use-conda -j -s cellect-genes.snakefile --configfile config-DUMMY.yml will run without error when the subworkflow cellect_magma have been out-commented from cellect-genes.snakefile (https://github.com/perslab/CELLECT/blob/master/cellect-genes.snakefile#L92-L94) and get_corrected_pvals have been copied into the workflow:

# Outcomment 
# subworkflow cellect_magma:
#     snakefile:
#         "cellect-magma.snakefile"

# Copy rule from cellect_magma
rule get_corrected_pvals:
        '''
    Calculate gene-based p-values (corrected) from the corrected Z-stats. These are used only for cellect-genes, while prioritizing is done with Z stats. (results would be identical of course)
        '''
        input:
            "{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.out"
        output:
            "{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.pvals.out"
        conda:
            "envs/cellectpy3.yml"
        params:
            gwas_name = lambda wildcards: GWAS_SUMSTATS[wildcards.gwas]['id'],
            base_output_dir = "{BASE_OUTPUT_DIR}"
        script:
            "scripts/convert_zstat_to_pval.py"

# Single line modification to this rule to remove dependency of cellect_magma 
rule get_effector_genes:
        '''
        Extract top percentile of ESmu > 0.0 genes 
        Extract top n MAGMA genes
        Take intersection
        '''        
        input: 
                expand("{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.pvals.out", BASE_OUTPUT_DIR=BASE_OUTPUT_DIR,gwas=list(GWAS_SUMSTATS.keys()))    
            # cellect_magma(expand("{BASE_OUTPUT_DIR}/precomputation/{gwas}/{gwas}.resid_correct_all.gsa.genes.pvals.out", BASE_OUTPUT_DIR=BASE_OUTPUT_DIR,gwas=list(GWAS_SUMSTATS.keys()))) 
        output:
                "{CELLECT_GENES_OUTPUT_DIR}/out/{run_prefix}__{gwas}.effector_genes.txt"
        conda:
                "envs/cellectpy3.yml"
        log:
                "{CELLECT_GENES_OUTPUT_DIR}/logs/log.get_effector_genes_snake.{run_prefix}.{gwas}.txt"
        params: 
                magma_output_dir = BASE_OUTPUT_DIR,
                cellect_genes_output_dir = CELLECT_GENES_OUTPUT_DIR,
                specificity_matrix_file = lambda wildcards: SPECIFICITY_INPUT[wildcards.run_prefix]['path'],
                specificity_matrix_name = "{run_prefix}",
                gwas_name = lambda wildcards: GWAS_SUMSTATS[wildcards.gwas]['id'],
                n_genes_magma = config['N_GENES_MAGMA'],
                percentile_cutoff_esmu = config['PERCENTILE_CUTOFF_ESMU']
        script: 
                "scripts/get_effector_genes.py"

@JonThom or @Tobi1kenobi : do you have any ideas for how to solve this?