Error in rule get_read_stats and run_decontamination during run qc
Closed this issue · 4 comments
Hello there,
I have been successfully setting up atlas on a Torque cluster, but ran into this error while trying to run qc on a single sample:
atlas run qc --working-dir data/atlas_workflow/ --profile cluster -c data/atlas_workflow/config.yaml --conda-prefix data/99_databases/conda_envs/
[Atlas] INFO: Atlas version: 2.15.1
Config file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/config/default_config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Creating conda environment /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/required_packages.yaml...
Downloading and installing remote packages.
Environment for /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/required_packages.yaml created (location: data/99_databases/conda_envs/9549dec21e9f8371b353aebcd66776ec_)
Creating conda environment /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/report.yaml...
Downloading and installing remote packages.
Environment for /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/report.yaml created (location: data/99_databases/conda_envs/2a1e961ee3625284022303553f52c511_)
Using shell: /usr/bin/bash
Provided cluster nodes: 40
Singularity containers: ignored
Job stats:
job count min threads max threads
------------------------- ------- ------------- -------------
build_qc_report 1 1 1
calculate_insert_size 1 10 10
combine_insert_stats 1 1 1
combine_read_counts 1 1 1
combine_read_length_stats 1 1 1
finalize_sample_qc 1 1 1
get_read_stats 3 10 10
qc 1 1 1
qcreads 1 1 1
run_decontamination 1 10 10
write_read_counts 1 1 1
total 13 1 10
Select jobs to execute...
[Mon May 8 14:25:48 2023]
rule run_decontamination:
input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_se.fastq.gz, ref/genome/1/summary.txt
output: CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_se.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_se.fastq.gz, CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt
log: CwM1/logs/QC/decontamination.log
jobid: 3
benchmark: logs/benchmarks/QC/decontamination/CwM1.txt
reason: Missing output files: CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz
wildcards: sample=CwM1
threads: 10
resources: mem_mb=120000, mem_mib=114441, disk_mb=9679, disk_mib=9231, tmpdir=<TBD>, mem=120, java_mem=102, time_min=60, runtime=3600
CLUSTER: 2023-05-08 14:25:48 Automatically choose best queue to submit
CLUSTER: 2023-05-08 14:25:50 Choose queue: thinnode
CLUSTER: 2023-05-08 14:25:50 parameter 'nodes' not in keymapping! It would be better if you add the key to the file: /home/projects/dtu_00055/people/linroa/cluster/key_mapping.yaml
I try without the key!
CLUSTER: 2023-05-08 14:25:50 submit command: qsub -A dtu_00055 -N run_decontamination -l nodes=1:ppn=10 -l walltime=6000 -l mem=120000mb -l nodes=1:thinnode /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.run_decontamination.3.sh
Submitted job 3 with external jobid '42521368'.
[Mon May 8 14:25:50 2023]
rule get_read_stats:
input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz
output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
log: CwM1/logs/QC/read_stats/filtered.log
jobid: 6
reason: Missing output files: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
wildcards: sample=CwM1, step=filtered
priority: 30
threads: 10
resources: mem_mb=120000, mem_mib=114441, disk_mb=9664, disk_mib=9217, tmpdir=<TBD>, mem=120, java_mem=102, time_min=60, runtime=3600
CLUSTER: 2023-05-08 14:25:50 Automatically choose best queue to submit
CLUSTER: 2023-05-08 14:25:51 Choose queue: thinnode
CLUSTER: 2023-05-08 14:25:51 parameter 'nodes' not in keymapping! It would be better if you add the key to the file: /home/projects/dtu_00055/people/linroa/cluster/key_mapping.yaml
I try without the key!
CLUSTER: 2023-05-08 14:25:51 submit command: qsub -A dtu_00055 -N get_read_stats -l nodes=1:ppn=10 -l walltime=6000 -l mem=120000mb -l nodes=1:thinnode /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.get_read_stats.6.sh
Submitted job 6 with external jobid '42521370'.
[Mon May 8 14:26:38 2023]
Error in rule run_decontamination:
jobid: 3
input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_se.fastq.gz, ref/genome/1/summary.txt
output: CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_se.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_se.fastq.gz, CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt
log: CwM1/logs/QC/decontamination.log (check log file(s) for error details)
conda-env: /home/projects/dtu_00055/data/atlas_workflow/data/99_databases/conda_envs/9549dec21e9f8371b353aebcd66776ec_
shell:
if [ "true" = true ] ; then
bbsplit.sh in1=CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz in2=CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz outu1=CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz outu2=CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz basename="CwM1/sequence_quality_control/contaminants/%_R#.fastq.gz" maxindel=20 minratio=0.65 minhits=1 ambiguous=best refstats=CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt threads=10 k=13 local=t pigz=t unpigz=t ziplevel=9 -Xmx102G 2> CwM1/logs/QC/decontamination.log
fi
bbsplit.sh in=CwM1/sequence_quality_control/CwM1_filtered_se.fastq.gz outu=CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz basename="CwM1/sequence_quality_control/contaminants/%_se.fastq.gz" maxindel=20 minratio=0.65 minhits=1 ambiguous=best refstats=CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt append interleaved=f threads=10 k=13 local=t pigz=t unpigz=t ziplevel=9 -Xmx102G 2>> CwM1/logs/QC/decontamination.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: 42521368
Error executing rule run_decontamination on cluster (jobid: 3, external: 42521368, jobscript: /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.run_decontamination.3.sh). For error details see the cluster log and the log files of the involved rule(s).
[Mon May 8 14:26:38 2023]
Error in rule get_read_stats:
jobid: 6
input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz
output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
log: CwM1/logs/QC/read_stats/filtered.log (check log file(s) for error details)
cluster_jobid: 42521370
Error executing rule get_read_stats on cluster (jobid: 6, external: 42521370, jobscript: /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.get_read_stats.6.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: .snakemake/log/2023-05-08T140746.785390.snakemake.log
[Atlas] CRITICAL: Command 'snakemake --snakefile /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/Snakefile --directory /home/projects/dtu_00055/data/atlas_workflow --rerun-triggers mtime --jobs 40 --rerun-incomplete --configfile '/home/projects/dtu_00055/data/atlas_workflow/config.yaml' --nolock --profile /home/projects/dtu_00055/people/linroa/cluster/ --use-conda --conda-prefix /home/projects/dtu_00055/data/99_databases/conda_envs --scheduler greedy qc --conda-prefix data/99_databases/conda_envs/ ' returned non-zero exit status 1.
The error log of get_read_stats looks like this:
Config file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages
/atlas/workflow/config/default_config.yaml is extended by additional config spec
ified via the command line.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=120000, mem_mib=114441, disk_mb=9664, disk_mib=9217,
mem=120, java_mem=102, time_min=60
Singularity containers: ignored
Select jobs to execute...
[Mon May 8 14:26:31 2023]
rule get_read_stats:
input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequenc
e_quality_control/CwM1_filtered_R2.fastq.gz
output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence
_quality_control/read_stats/filtered_read_counts.tsv
log: CwM1/logs/QC/read_stats/filtered.log
jobid: 0
reason: Missing output files: CwM1/sequence_quality_control/read_stats/filte
red_read_counts.tsv, CwM1/sequence_quality_control/read_stats/filtered.zip
wildcards: sample=CwM1, step=filtered
priority: 30
resources: mem_mb=120000, mem_mib=114441, disk_mb=9664, disk_mib=9217, tmpdir=/scratch/42521370, mem=120, java_mem=102, time_min=60, runtime=3600
[Mon May 8 14:26:31 2023]
Error in rule get_read_stats:
jobid: 0
input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz
output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
log: CwM1/logs/QC/read_stats/filtered.log (check log file(s) for error details)
RuleException:
CalledProcessError in file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk, line 148:
Command 'set -euo pipefail; mkdir -p CwM1/sequence_quality_control/read_stats/filtered/pe 2> CwM1/logs/QC/read_stats/filtered.log ; reformat.sh in1=CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz in2=CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz bhist=CwM1/sequence_quality_control/read_stats/filtered/pe/base_hist.txt qhist=CwM1/sequence_quality_control/read_stats/filtered/pe/quality_by_pos.txt lhist=CwM1/sequence_quality_control/read_stats/filtered/pe/readlength.txt gchist=CwM1/sequence_quality_control/read_stats/filtered/pe/gc_hist.txt gcbins=auto bqhist=CwM1/sequence_quality_control/read_stats/filtered/pe/boxplot_quality.txt threads=1 overwrite=true -Xmx102G 2> >(tee -a CwM1/logs/QC/read_stats/filtered.log CwM1/sequence_quality_control/read_stats/filtered/pe/read_stats.tmp )' returned non-zero exit status 127.
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk", line 178, in __rule_get_read_stats
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk", line 148, in get_read_stats
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
The error log for run_decontamination looks like this:
Config file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages
/atlas/workflow/config/default_config.yaml is extended by additional config spec
ified via the command line.
Building DAG of jobs...
/bin/bash: conda: command not found
Traceback (most recent call last):
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/__init__.py", line 757, in snakemake
success = workflow.execute(
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/workflow.py", line 935, in execute
dag.create_conda_envs(
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/dag.py", line 351, in create_conda_envs
env.create(dryrun)
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/deployment/conda.py", line 405, in create
pin_file = self.pin_file
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/common/__init__.py", line 221, in __get__
value = self.method(instance)
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/deployment/conda.py", line 109, in pin_file
f".{self.conda.platform}.pin.txt"
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/common/__init__.py", line 221, in __get__
value = self.method(instance)
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/deployment/conda.py", line 102, in conda
return Conda(
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/deployment/conda.py", line 667, in __init__
shell.check_output(self._get_cmd(f"conda info --json"), text=True)
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/shell.py", line 63, in check_output
return sp.check_output(cmd, shell=True, executable=executable, **kwargs)
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'conda info --json' returned non-zero exit status 127.
The cluster log is empty.
I am using atlas version 2.15.1
I saw this issue which had errors in the same rules, but did not mention any solutions before the issue being solved.
I hope you can help, since I am quite stuck and was excited to use atlas on my samples!
Best regards,
Both errors are related in that the software, e.g. conda is no longer found when submitted to the cluster.
But strange why this happens specifically for these rules.
Could you try to simply update to atlas v2.15.2 ?
Yes, this is what I suspected too. It seems the software (Conda, atlas) that I load with my script is not loaded with the next jobs submitted by atlas maybe?
I have very limited rights for software installation on the cluster if that would impact something like this.
I will try to update to atlas 2.15.2 and see if it helps. Will be back then!
Edit: Oh, also I think it is happening for these rules specifically because I ran part of the qc locally before moving onto the cluster so these are probably the next rules in the pipeline from where the local execution stopped. Sorry for not mentioning earlier, I kind of forgot I did this.
There is also a anakemake argument that you can supply to the atlas command. it is intended for internal use but It might fix your problem.
--conda-base-path
abs path of conda base installation (home of conda, mamba, activate)
I updated to atlas 2.15.2 with no improvement and also tried adding the --conda-base-path to no avail.
In the end I managed to get it running in single machine execution mode, which is not ideal, but works.
No conda environments or software is being transferred once atlas submits a new job to the queue, but I did not manage to find a way to fix it.
Going to close this issue since I found a workaround, but feel free to reopen with real solutions any time.