genxnetwork/grape

Snakemake expands fail in preprocessing.smk

Closed this issue · 3 comments

Hi.

I've built the docker image using the latest release as per instructions and downloaded the reference, followed by running

docker run --rm -it -v /home/danilovk/relatives_finder/media:/media -v /etc/localtime:/etc/localtime:ro genx_relatives:latest launcher.py preprocess --ref-directory /media/ref --assembly hg38 --vcf-file /media/vcf_source/current_merged_cut.vcf.gz --directory /media/results

without any changes in config.yaml or elsewhere. It resulted in the following error:

Namespace(alpha=0.01, alt_hom_samples=1.0, assembly='hg38', chip='background.vcf.gz', client=False, command='preprocess', conda_prefix='/tmp', configfile='config.yaml', cores=15, directory='/media/results', flow='ibis', het_samples=5.0, ibis_min_snp=500, ibis_seg_len=7.0, impute=False, input='input', memory=61, missing_samples=15.0, num_batches=1, phase=False, real_run=False, ref_directory='/media/ref', remove_imputation=False, rule=None, samples='samples.tsv', seed=8031841, sim_params_file='params/Relatives.def', sim_samples_file='ceph_unrelated_all.tsv', snakefile='', stat_file='stat_file.txt', target=['all'], unlock=False, until=None, use_bundle=False, vcf_file='/media/vcf_source/current_merged_cut.vcf.gz', weight_mask=None, zero_seg_count=0.5, zero_seg_len=5.0)

environ({'PATH': '/opt/conda/envs/snakemake/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/Minimac3Executable/bin:/opt/germline/bin', 'HOSTNAME': '48d1f7045e53', 'TERM': 'xterm', 'DEBIAN_FRONTEND': 'noninteractive', 'LANG': 'C.UTF-8', 'SHELL': '/bin/bash', 'HOME': '/root', 'KMP_DUPLICATE_LIB_OK': 'True', 'KMP_INIT_AT_FORK': 'FALSE', 'CONDA_ENVS_PATH': '/tmp/envs', 'CONDA_PKGS_DIRS': '/tmp/conda/pkgs'})
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Conda environment /src/repo/rules/../envs/evaluation.yaml will be created.
Conda environment /src/repo/rules/../envs/bcf_plink.yaml will be created.
Conda environment /src/repo/rules/../envs/bcftools.yaml will be created.
Conda environment /src/repo/rules/../envs/liftover.yaml will be created.
Conda environment /src/repo/rules/../envs/plink.yaml will be created.
Conda environment /src/repo/rules/../envs/ibis.yaml will be created.
Job stats:
job                                     count    min threads    max threads
------------------------------------  -------  -------------  -------------
all                                         1              1              1
copy_batch                                  1              1              1
copy_imputation                             1              1              1
copy_phase                                  1              1              1
copy_vcf                                    1              1              1
liftover                                    1              1              1
plink_clean_up                              1              1              1
plink_filter                                1              1              1
pre_imputation_check                        1              1              1
prepare_vcf                                 1              1              1
recode_snp_ids                              1              1              1
select_bad_samples                          1              1              1
single_batch_convert_mapped_to_plink        1              1              1
single_batch_ibis_mapping                   1              1              1
vcf_stats                                   1              1              1
total                                      15              1              1


[Sun Dec 25 21:58:26 2022]
rule copy_batch:
    input: input.vcf.gz
    output: vcf/batch1.vcf.gz
    jobid: 10
    reason: Missing output files: vcf/batch1.vcf.gz
    resources: tmpdir=/tmp


            cp input.vcf.gz vcf/batch1.vcf.gz


[Sun Dec 25 21:58:26 2022]
rule copy_vcf:
    input: vcf/batch1.vcf.gz
    output: vcf/batch1_imputation_removed.vcf.gz
    jobid: 9
    reason: Missing output files: vcf/batch1_imputation_removed.vcf.gz; Input files updated by another job: vcf/batch1.vcf.gz
    wildcards: batch=batch1
    resources: tmpdir=/tmp


                cp vcf/batch1.vcf.gz vcf/batch1_imputation_removed.vcf.gz


[Sun Dec 25 21:58:26 2022]
rule vcf_stats:
    input: vcf/batch1.vcf.gz
    output: stats/batch1_lifted_vcf.txt, stats/batch1_lifted_vcf.psc
    jobid: 12
    reason: Missing output files: stats/batch1_lifted_vcf.psc; Input files updated by another job: vcf/batch1.vcf.gz
    wildcards: batch=batch1
    resources: tmpdir=/tmp


            bcftools query --list-samples vcf/batch1.vcf.gz > vcf/batch1_merged_lifted.vcf.samples
            bcftools stats -S vcf/batch1_merged_lifted.vcf.samples vcf/batch1.vcf.gz > stats/batch1_lifted_vcf.txt
            # PSC means per-sample counts
            cat stats/batch1_lifted_vcf.txt | grep '^PSC' > stats/batch1_lifted_vcf.psc

Would remove temporary output vcf/batch1.vcf.gz

[Sun Dec 25 21:58:26 2022]
rule liftover:
    input: vcf/batch1_imputation_removed.vcf.gz
    output: vcf/batch1_merged_lifted.vcf.gz
    log: logs/liftover/liftoverbatch1.log
    jobid: 8
    reason: Missing output files: vcf/batch1_merged_lifted.vcf.gz; Input files updated by another job: vcf/batch1_imputation_removed.vcf.gz
    wildcards: batch=batch1
    resources: tmpdir=/tmp, mem_mb=20480

RuleException in rule liftover in line 97 of /src/repo/workflows/preprocess2/../../rules/preprocessing.smk:
NameError: The name 'batch' is unknown in this context. Did you mean 'wildcards.batch'?, when formatting the following:

               java -Xmx{params.mem_gb}g -jar /picard/picard.jar LiftoverVcf WARN_ON_MISSING_CONTIG=true MAX_RECORDS_IN_RAM=5000 I={input.vcf} O={output.vcf} CHAIN={LIFT_CHAIN} REJECT=vcf/chr{batch}_rejected.vcf.gz R={GRCH37_FASTA} |& tee -a {log}

Traceback (most recent call last):
  File "launcher.py", line 453, in <module>
    raise ValueError("Pipeline failed see Snakemake error message for details")
ValueError: Pipeline failed see Snakemake error message for details

Could you please check the preprocessing.smk file and other snakemake files since it seems like a pure snakemake error. The input file is ok and was validated.

Btw the wildcards.batch issue also applies to rules/imputation.smk line 138

@danilovkiri Hi, Kirill! Thank you for pointing out this issue! I just applied the right fixes and tested them, everything should work now. If you stumble upon any other bug, please report it, this helps to improve the code. And I will respond and react to your call as quickly as possible. Also happy new Year!:santa:

@Jahysama Thank you, Egor. Happy New Year:)