Snakemake expands fail in preprocessing.smk
Closed this issue · 3 comments
Hi.
I've built the docker image using the latest release as per instructions and downloaded the reference, followed by running
docker run --rm -it -v /home/danilovk/relatives_finder/media:/media -v /etc/localtime:/etc/localtime:ro genx_relatives:latest launcher.py preprocess --ref-directory /media/ref --assembly hg38 --vcf-file /media/vcf_source/current_merged_cut.vcf.gz --directory /media/results
without any changes in config.yaml or elsewhere. It resulted in the following error:
Namespace(alpha=0.01, alt_hom_samples=1.0, assembly='hg38', chip='background.vcf.gz', client=False, command='preprocess', conda_prefix='/tmp', configfile='config.yaml', cores=15, directory='/media/results', flow='ibis', het_samples=5.0, ibis_min_snp=500, ibis_seg_len=7.0, impute=False, input='input', memory=61, missing_samples=15.0, num_batches=1, phase=False, real_run=False, ref_directory='/media/ref', remove_imputation=False, rule=None, samples='samples.tsv', seed=8031841, sim_params_file='params/Relatives.def', sim_samples_file='ceph_unrelated_all.tsv', snakefile='', stat_file='stat_file.txt', target=['all'], unlock=False, until=None, use_bundle=False, vcf_file='/media/vcf_source/current_merged_cut.vcf.gz', weight_mask=None, zero_seg_count=0.5, zero_seg_len=5.0)
environ({'PATH': '/opt/conda/envs/snakemake/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/Minimac3Executable/bin:/opt/germline/bin', 'HOSTNAME': '48d1f7045e53', 'TERM': 'xterm', 'DEBIAN_FRONTEND': 'noninteractive', 'LANG': 'C.UTF-8', 'SHELL': '/bin/bash', 'HOME': '/root', 'KMP_DUPLICATE_LIB_OK': 'True', 'KMP_INIT_AT_FORK': 'FALSE', 'CONDA_ENVS_PATH': '/tmp/envs', 'CONDA_PKGS_DIRS': '/tmp/conda/pkgs'})
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Conda environment /src/repo/rules/../envs/evaluation.yaml will be created.
Conda environment /src/repo/rules/../envs/bcf_plink.yaml will be created.
Conda environment /src/repo/rules/../envs/bcftools.yaml will be created.
Conda environment /src/repo/rules/../envs/liftover.yaml will be created.
Conda environment /src/repo/rules/../envs/plink.yaml will be created.
Conda environment /src/repo/rules/../envs/ibis.yaml will be created.
Job stats:
job count min threads max threads
------------------------------------ ------- ------------- -------------
all 1 1 1
copy_batch 1 1 1
copy_imputation 1 1 1
copy_phase 1 1 1
copy_vcf 1 1 1
liftover 1 1 1
plink_clean_up 1 1 1
plink_filter 1 1 1
pre_imputation_check 1 1 1
prepare_vcf 1 1 1
recode_snp_ids 1 1 1
select_bad_samples 1 1 1
single_batch_convert_mapped_to_plink 1 1 1
single_batch_ibis_mapping 1 1 1
vcf_stats 1 1 1
total 15 1 1
[Sun Dec 25 21:58:26 2022]
rule copy_batch:
input: input.vcf.gz
output: vcf/batch1.vcf.gz
jobid: 10
reason: Missing output files: vcf/batch1.vcf.gz
resources: tmpdir=/tmp
cp input.vcf.gz vcf/batch1.vcf.gz
[Sun Dec 25 21:58:26 2022]
rule copy_vcf:
input: vcf/batch1.vcf.gz
output: vcf/batch1_imputation_removed.vcf.gz
jobid: 9
reason: Missing output files: vcf/batch1_imputation_removed.vcf.gz; Input files updated by another job: vcf/batch1.vcf.gz
wildcards: batch=batch1
resources: tmpdir=/tmp
cp vcf/batch1.vcf.gz vcf/batch1_imputation_removed.vcf.gz
[Sun Dec 25 21:58:26 2022]
rule vcf_stats:
input: vcf/batch1.vcf.gz
output: stats/batch1_lifted_vcf.txt, stats/batch1_lifted_vcf.psc
jobid: 12
reason: Missing output files: stats/batch1_lifted_vcf.psc; Input files updated by another job: vcf/batch1.vcf.gz
wildcards: batch=batch1
resources: tmpdir=/tmp
bcftools query --list-samples vcf/batch1.vcf.gz > vcf/batch1_merged_lifted.vcf.samples
bcftools stats -S vcf/batch1_merged_lifted.vcf.samples vcf/batch1.vcf.gz > stats/batch1_lifted_vcf.txt
# PSC means per-sample counts
cat stats/batch1_lifted_vcf.txt | grep '^PSC' > stats/batch1_lifted_vcf.psc
Would remove temporary output vcf/batch1.vcf.gz
[Sun Dec 25 21:58:26 2022]
rule liftover:
input: vcf/batch1_imputation_removed.vcf.gz
output: vcf/batch1_merged_lifted.vcf.gz
log: logs/liftover/liftoverbatch1.log
jobid: 8
reason: Missing output files: vcf/batch1_merged_lifted.vcf.gz; Input files updated by another job: vcf/batch1_imputation_removed.vcf.gz
wildcards: batch=batch1
resources: tmpdir=/tmp, mem_mb=20480
RuleException in rule liftover in line 97 of /src/repo/workflows/preprocess2/../../rules/preprocessing.smk:
NameError: The name 'batch' is unknown in this context. Did you mean 'wildcards.batch'?, when formatting the following:
java -Xmx{params.mem_gb}g -jar /picard/picard.jar LiftoverVcf WARN_ON_MISSING_CONTIG=true MAX_RECORDS_IN_RAM=5000 I={input.vcf} O={output.vcf} CHAIN={LIFT_CHAIN} REJECT=vcf/chr{batch}_rejected.vcf.gz R={GRCH37_FASTA} |& tee -a {log}
Traceback (most recent call last):
File "launcher.py", line 453, in <module>
raise ValueError("Pipeline failed see Snakemake error message for details")
ValueError: Pipeline failed see Snakemake error message for details
Could you please check the preprocessing.smk file and other snakemake files since it seems like a pure snakemake error. The input file is ok and was validated.
Btw the wildcards.batch
issue also applies to rules/imputation.smk
line 138
@danilovkiri Hi, Kirill! Thank you for pointing out this issue! I just applied the right fixes and tested them, everything should work now. If you stumble upon any other bug, please report it, this helps to improve the code. And I will respond and react to your call as quickly as possible. Also happy new Year!:santa:
@Jahysama Thank you, Egor. Happy New Year:)