[Bug]: Set snakemake cache directory for pangolin
Closed this issue · 3 comments
What container were you trying to use, and how were you attempting to use it?
I've run into a problem with the latest version of pangolin when I run this locally with singularity (even with the --no-home flag) and with nextflow run with singularity (which is where the error below is copied and pasted from). I think this is due to snakemake as opposed to pangolin or docker.
Workflow execution completed unsuccessfully
Error executing process > 'CECRET:sarscov2:pangolin (SARS-CoV-2 lineage Determination)'
Caused by:
Missing output file(s) `pangolin/lineage_report.csv` expected by process `CECRET:sarscov2:pangolin (SARS-CoV-2 lineage Determination)`
Command executed:
mkdir -p pangolin logs/CECRET:sarscov2:pangolin
log=logs/CECRET:sarscov2:pangolin/CECRET:sarscov2:pangolin.9c311c56-c433-4e9a-8985-13bc59367c52.log
date > $log
pangolin --all-versions >> $log
for fasta in SRR13957170.consensus.fa SRR13957177S.consensus.fa SRR13957125.consensus.fa
do
cat $fasta >> ultimate_fasta.fasta
done
pangolin --threads 4 --outdir pangolin ultimate_fasta.fasta | tee -a $log
cp ultimate_fasta.fasta pangolin/combined.fasta
Command exit status:
0
Command output:
[32m****
Pangolin running in usher mode.
****[0m
[32mMaximum ambiguity allowed is 0.3.
****[0m
[32mQuery file: [0multimate_fasta.fasta
[32m****
Data files found:[0m
usher_pb: /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb
[32m****[0m
Command error:
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Traceback (most recent call last):
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/__init__.py", line 587, in snakemake
workflow = Workflow(
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/workflow.py", line 242, in __init__
self.sourcecache = SourceCache()
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/sourcecache.py", line 358, in __init__
os.makedirs(self.cache, exist_ok=True)
File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
[Previous line repeated 1 more time]
File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/home/eriny'
Work dir:
/Volumes/IDGenomics_NAS/Bioinformatics/eriny/testing_cecret/2023-11-15/work/f6/bef84348e3d584d271f7e9b226d4eb
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run
I think this can be resolved through setting the snakemake cache directory (XDG_CACHE_HOME?) in the image. I'm currently travelling this week and next, so I can't test this, but I think adding
RUN mkdir /scratch
ENV XDG_CACHE_HOME=/scratch
might resolve the issue.
Relevant log output
No response
I've created an image at quay.io/uphl/pangolin:latest
which is the 'latest' version of the pangolin dockerfile with the cache directory set. I can test this with singularity when I'm done traveling (December 11ish), but I wouldn't mind feedback before then.
Thanks for making the image available to test.
I tested the docker image via one of Theiagen's WDL workflows in Terra and it ran without issue, likely because cromwell runs the WDLs as root (AFAIK)
Also tested the docker image locally on a FASTA file, but ran into a permissions error when I ran the docker image as my linux user (instead of root):
$ docker run --rm=True -u $(id -u):$(id -g) -v $(pwd):/data -ti quay.io/uphl/pangolin:latest /bin/bash
(pangolin) I have no name!@e30f38847d0d:/data$ pangolin EPI_ISL_18545562_BA.2.86.2.fasta
****
Pangolin running in usher mode.
****
Maximum ambiguity allowed is 0.3.
****
Query file: /data/EPI_ISL_18545562_BA.2.86.2.fasta
****
Data files found:
usher_pb: /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb
****
Traceback (most recent call last):
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/__init__.py", line 587, in snakemake
workflow = Workflow(
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/workflow.py", line 242, in __init__
self.sourcecache = SourceCache()
File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/sourcecache.py", line 358, in __init__
os.makedirs(self.cache, exist_ok=True)
File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/scratch/snakemake'
Would it be better to set the cache dir variable to a directory that's writable to all users? perhaps use /tmp
? I imagine that you might hit this issue too with singularity, depending on what directories are mounted to the container at runtime