epi2me-labs/wf-single-cell

samtools: unrecognized option '--no-PG'

Opened this issue · 3 comments

Operating System

Other Linux (please specify below)

Other Linux

Ubuntu 20.04

Workflow Version

v1.1.0

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

nextflow -c nextflow.config run epi2me-labs/wf-single-cell \
    -w $WORK/Barcode_PCR_10x_3prime/workspace \
    -profile singularity \
    --fastq $WORK/Barcode_PCR_10x_3prime.fastq.gz \
    --kit_name 3prime \
    --kit_version v3 \
    --expected_cells 1000 \
    --ref_genome_dir /mnt/home/maestri/references/GRCh38_w_Barcode_region \
    --out_dir $WORK/Barcode_PCR_10x_3prime \
    --matrix_min_genes 0 \
    --matrix_min_cells 0

Workflow Execution - CLI Execution Profile

None

What happened?

In align_to_ref process, the following command gives error:
minimap2 -ax splice -uf --secondary=no --MD -t 4 --junc-bed ref_genes.bed -I 16G genome_index.mmi reads.fastq | samtools view -b --no-PG -t ref_chrom_sizes - | tee >(samtools sort -@ 2 --no-PG - > "Barcode_PCR_10x_3prime_sorted.bam") | seqkit bam -F - 2> bam_info.tsv

Error:

view: unrecognized option '--no-PG'
sort: unrecognized option '--no-PG'

I then opened an interactive session with the same singularity image used by the process, and I noticed that using that command my own samtools I have in the PATH (v1.5) was being used instead, resulting in an "unknown --no-PG" parameter error. Conversely, when using /home/epi2melabs/conda/bin/samtools in place of samtools, the process ran smoothly. I am wondering this may be the reason of this error.
Is there a reason why the full path of each tool is not specified in the pipeline or why when the docker image was built the path /home/epi2melabs/conda/bin/ was not preprended to the PATH environment variable?

EDIT: I forced the pipeline to use samtools in /home/epi2melabs/conda/bin/, but this did not solve the issue.

Thanks in advance,
Simone

Relevant log output

ERROR ~ Error executing process > 'pipeline:align:align_to_ref (2)'

Caused by:
  Process `pipeline:align:align_to_ref (2)` terminated with an error exit status (255)

Command executed:

  minimap2 -ax splice -uf --secondary=no --MD -t 4       --junc-bed ref_genes.bed -I 16G        genome_index.mmi reads.fastq         | samtools view -b --no-PG -t ref_chrom_sizes -         | tee >(samtools sort -@ 2 --no-PG  - > "Barcode_PCR_10x_3prime_sorted.bam")         | seqkit bam -F - 2> bam_info.tsv

  samtools index -@ 4 "Barcode_PCR_10x_3prime_sorted.bam"

Command exit status:
  255

Command output:
  (empty)

Command error:
  WARNING: Ignoring /mnt/home/maestri bind mount: user bind control disabled by system administrator
  WARNING: underlay of /etc/localtime required more than 50 (78) bind mounts
  WARNING: Not mounting current directory: user bind control is disabled by system administrator
  [main_samview] fail to read the header from "-".
  [W::hts_set_opt] Cannot change block size for this format
  samtools sort: failed to read header from "-"

Work dir:
  /mnt/home/maestri/ONT_single_cell_tests/Barcode_PCR_10x_3prime/workspace/8f/afe8041f5f89056afb2a1d2e4181f1

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
WARN: Killing running tasks (1)
WARN: Tower request field `workflow.errorMessage` exceeds expected size | offending value: `WARNING: Ignoring /mnt/home/maestri bind mount: user bind control disabled by system administrator
WARNING: underlay of /etc/localtime required more than 50 (78) bind mounts
WARNING: Not mounting current directory: user bind control is disabled by system administrator
[main_samview] fail to read the header from "-".
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"`, size: 421 (max: 255)

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

Hi @MaestSi,

This is an issue particular to the use of Singularity/Apptainer observed by several of our users. You are correct that our workflow tools are installed into the container at /home/epi2melabs/conda/bin/. This path is contained within PATH of the shell that runs when workflow script are invoked.

The issue is that Nextflow can end up mounting the host systems /home into the container. This happens when Nextflow notices that /home is a common path of files that need to be present in the container. For example if you have two files /home/foo and /home/bar Nextflow will not mount these separately but prefer to mount simply /home. This mounting then stomps on the /home/epi2melabs/conda/bin/ path meaning that the images workflow tools are no longer present.

The current workaround is to ensure that you do not have inputs with a longest common path being /home (as in the example above). Longer term we are going to rebuild all our containers with the tools installed under /opt/ or /usr/local.

Hi, I see, thank you for the clarification.
Instead of installing all your tools into a different folder, wouldn't it be enough to write in the Dockerfile:
ENV PATH /home/epi2melabs/conda/bin:$PATH?
In this way, with epi2melabs bin folder prepending all other folders already in the PATH (user's home included), epi2melabs executables may have the precedence over the tools outside of the container. What do you think of this solution?
Thanks in advance,
Simone

What you are describing is what is already done.

The issue as I've described above is that the /home path of the container essentially no longer exists after Nextflow mounts a host path at /home.

The solution is as I've described, rebuild the container to place the tools somewhere else and avoid Nextflow inadvertently stomping on the path where the programs exist.