StaPH-B/docker-builds

[Bug]: Set snakemake cache directory for pangolin

Closed this issue · 3 comments

What container were you trying to use, and how were you attempting to use it?

I've run into a problem with the latest version of pangolin when I run this locally with singularity (even with the --no-home flag) and with nextflow run with singularity (which is where the error below is copied and pasted from). I think this is due to snakemake as opposed to pangolin or docker.

Workflow execution completed unsuccessfully
Error executing process > 'CECRET:sarscov2:pangolin (SARS-CoV-2 lineage Determination)'

Caused by:
  Missing output file(s) `pangolin/lineage_report.csv` expected by process `CECRET:sarscov2:pangolin (SARS-CoV-2 lineage Determination)`

Command executed:

  mkdir -p pangolin logs/CECRET:sarscov2:pangolin
  log=logs/CECRET:sarscov2:pangolin/CECRET:sarscov2:pangolin.9c311c56-c433-4e9a-8985-13bc59367c52.log
  
  date > $log
  pangolin --all-versions >> $log
  
  for fasta in SRR13957170.consensus.fa SRR13957177S.consensus.fa SRR13957125.consensus.fa
  do
    cat $fasta >> ultimate_fasta.fasta
  done
  
  pangolin        --threads 4       --outdir pangolin       ultimate_fasta.fasta       | tee -a $log
  cp ultimate_fasta.fasta pangolin/combined.fasta

Command exit status:
  0

Command output:
  [32m****
  Pangolin running in usher mode.
  ****[0m
  [32mMaximum ambiguity allowed is 0.3.
  ****[0m
  [32mQuery file:	[0multimate_fasta.fasta
  [32m****
  Data files found:[0m
  usher_pb:	/opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb
  [32m****[0m

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Traceback (most recent call last):
    File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/__init__.py", line 587, in snakemake
      workflow = Workflow(
    File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/workflow.py", line 242, in __init__
      self.sourcecache = SourceCache()
    File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/sourcecache.py", line 358, in __init__
      os.makedirs(self.cache, exist_ok=True)
    File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
      makedirs(head, exist_ok=exist_ok)
    File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
      makedirs(head, exist_ok=exist_ok)
    File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
      makedirs(head, exist_ok=exist_ok)
    [Previous line repeated 1 more time]
    File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 223, in makedirs
      mkdir(name, mode)
  OSError: [Errno 30] Read-only file system: '/home/eriny'

Work dir:
  /Volumes/IDGenomics_NAS/Bioinformatics/eriny/testing_cecret/2023-11-15/work/f6/bef84348e3d584d271f7e9b226d4eb

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run

I think this can be resolved through setting the snakemake cache directory (XDG_CACHE_HOME?) in the image. I'm currently travelling this week and next, so I can't test this, but I think adding

RUN mkdir /scratch
ENV XDG_CACHE_HOME=/scratch

might resolve the issue.

Relevant log output

No response

I've created an image at quay.io/uphl/pangolin:latest which is the 'latest' version of the pangolin dockerfile with the cache directory set. I can test this with singularity when I'm done traveling (December 11ish), but I wouldn't mind feedback before then.

Thanks for making the image available to test.

I tested the docker image via one of Theiagen's WDL workflows in Terra and it ran without issue, likely because cromwell runs the WDLs as root (AFAIK)

Also tested the docker image locally on a FASTA file, but ran into a permissions error when I ran the docker image as my linux user (instead of root):

$ docker run --rm=True -u $(id -u):$(id -g) -v $(pwd):/data -ti quay.io/uphl/pangolin:latest /bin/bash
(pangolin) I have no name!@e30f38847d0d:/data$ pangolin EPI_ISL_18545562_BA.2.86.2.fasta 
****
Pangolin running in usher mode.
****
Maximum ambiguity allowed is 0.3.
****
Query file:     /data/EPI_ISL_18545562_BA.2.86.2.fasta
****
Data files found:
usher_pb:       /opt/conda/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb
****
Traceback (most recent call last):
  File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/__init__.py", line 587, in snakemake
    workflow = Workflow(
  File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/workflow.py", line 242, in __init__
    self.sourcecache = SourceCache()
  File "/opt/conda/envs/pangolin/lib/python3.8/site-packages/snakemake/sourcecache.py", line 358, in __init__
    os.makedirs(self.cache, exist_ok=True)
  File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/opt/conda/envs/pangolin/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/scratch/snakemake'

Would it be better to set the cache dir variable to a directory that's writable to all users? perhaps use /tmp ? I imagine that you might hit this issue too with singularity, depending on what directories are mounted to the container at runtime

closed via PR #812