nf-core/hlatyping

The --index option is ignored

micknudsen opened this issue · 10 comments

When I run the command nextflow run nf-core/hlatyping --reads '*_R{1,2}.fastq.gz' -profile docker, it fails in the pre_map_hlastep with the following error:

Command error:
  Open failed on file /Users/micknudsen/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna.txt.size: "No such file or directory"
  yara_mapper: Error while opening reference file.

The file does exist, so I thought it might be an issue with Docker not being able to see the file. I therefore copied indices to the folder, where my FASTQ are, and ran nextflow run nf-core/hlatyping --reads '*_R{1,2}.fastq.gz' --index indices/yara/hla_reference_dna -profile docker.

However, the error remains the same. That is, the pipeline still looks for the file in /Users/micknudsen/.nextflow/.

This is my first time trying nextflow, so I may be missing something completely obvious. I am running this on a Mac.

Hi @micknudsen, thanks for reporting this. Haven't seen this before... Which nextflow and pipeline version are you using?

Hi @christopher-mohr. I just tried to reproduce the error on a different machine, but everthing worked perfectly – also with the index in the default location. I will follow up on this issue, when I get back to the machine which produced the error.

The issue appears to be caused by running the pipeline on an external harddrive. When I run it on my internal (system) harddrive, there are no problems (except that I will eventually run out of space).

I have tried circumventing the problem by using symbolic links, but that doesn't work.

Could it be a permission issue?

I suppose it has something to do with Docker. Outside Docker, there are no permission issues. My external drive is mounted at /Volumes/ (the standard location on a Mac), and Docker has permission to mount it.

Screen Shot 2020-04-29 at 18 46 42

I can use --outdir to write output to the external drive, by when I set --work-dir to point the external drive, it starts generating "work" output there, but then fails to locate the index. Here is an example of the --index option being ignored:

% nextflow run nf-core/hlatyping --reads '*_R{1,2}.fastq.gz' --index /Volumes/G-DRIVE/temp/indices/yara/hla_reference_dna -profile docker
N E X T F L O W  ~  version 20.01.0
Launching `nf-core/hlatyping` [gloomy_legentil] - revision: bf5d0c2d46 [master]
WARN: The access of `config` object is deprecated
WARN: Access to undefined parameter `genome` -- Initialise it to a default value eg. `params.genome = some_value`
WARN: Access to undefined parameter `fasta` -- Initialise it to a default value eg. `params.fasta = some_value`
[2m----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/hlatyping v1.1.5
----------------------------------------------------
Pipeline Release  : master
Run Name          : gloomy_legentil
File Type         : Other (fastq, fastq.gz, ...)
Seq Type          : dna
Index Location    : /Users/micknudsen/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna
IP solver         : glpk
Enumerations      : 1
Beta              : 0.009
Prefix            : hla_run
Max Memory        : 128 GB
Max CPUs          : 16
Max Time          : 10d
Output dir        : ./results
Working dir       : /Volumes/G-DRIVE/temp/work
Reads             : *_R{1,2}.fastq.gz
Fasta Ref         : null
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - nfcore/hlatyping:1.1.5
Launch dir        : /Volumes/G-DRIVE/temp
Script dir        : /Users/micknudsen/.nextflow/assets/nf-core/hlatyping
User              : micknudsen
Config Profile    : docker
[2m----------------------------------------------------
executor >  local (6)
[41/bbbd04] process > unzip                 [100%] 1 of 1 ✔
[af/2cf9cb] process > make_ot_config        [100%] 1 of 1 ✔
[59/a9a3c7] process > pre_map_hla           [100%] 1 of 1, failed: 1 ✘
[-        ] process > run_optitype          -
[e4/7607c5] process > output_documentation  [100%] 1 of 1 ✔
[e3/f94f5a] process > get_software_versions [100%] 1 of 1 ✔
[c4/48a0de] process > multiqc               [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
[0;35m[nf-core/hlatyping] Pipeline completed with errors
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'pre_map_hla (1)'

Caused by:
  Process `pre_map_hla (1)` terminated with an error exit status (1)

Command executed:

  yara_mapper -e 3 -t 1 -f bam /Users/micknudsen/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna unzipped_1.fastq unzipped_2.fastq > output.bam
  samtools view -@ 1 -h -F 4 -f 0x40 -b1 output.bam > mapped_1.bam
  samtools view -@ 1 -h -F 4 -f 0x80 -b1 output.bam > mapped_2.bam

Command exit status:
  1

Command output:
  (empty)

Command error:
  Open failed on file /Users/micknudsen/.nextflow/assets/nf-core/hlatyping/data/indices/yara/hla_reference_dna.txt.size: "No such file or directory"
  yara_mapper: Error while opening reference file.

Work dir:
  /Volumes/G-DRIVE/temp/work/59/a9a3c71c33a2f7b20e9236a725d1f9

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

I don't know much (anything ) about Docker, but I think this line in .command.run may provide a clue:

docker run -i --memory 8192m -e "NXF_DEBUG=${NXF_DEBUG:=0}" -v /Volumes/G-DRIVE/temp/work:/Volumes/G-DRIVE/temp/work -v /Users/micknudsen/.nextflow/assets/nf-core/hlatyping/bin:/Users/micknudsen/.nextflow/assets/nf-core/hlatyping/bin -v "$PWD":"$PWD" -w "$PWD" --entrypoint /bin/bash --name $NXF_BOXID nfcore/hlatyping:1.1.5 -c "eval $(nxf_container_env); /bin/bash /Volumes/G-DRIVE/temp/work/4a/bbbbb22fdc13e43582c131fb00bcae/.command.run nxf_trace"

Shouldn't there be a -v entry pointing to the index?

As far as I remember --index is only used with the --bam parameter.

Which files are available in the directory /Users/micknudsen/.nextflow/assets/nf-core/hlatyping/data/indices/yara/? Which permissions are set in this folder?

The index path is set in nextflow.config as the following:
index = "$baseDir/data/indices/yara/hla_reference_dna"

This is really odd. I tried manually editing nextflow.config to use the index copy on my external drive, however the problem persists, even though the file certainly does exist:

Command error:
  Open failed on file /Volumes/G-DRIVE/temp/indices/yara/hla_reference_dna.txt.size: "No such file or directory"
  yara_mapper: Error while opening reference file.

Work dir:
  /Volumes/G-DRIVE/temp/work/ff/c83efdfed9088c895c99a750d4a545

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

(nextflow) micknudsen@Michael-Knudsen-Home temp % ll /Volumes/G-DRIVE/temp/indices/yara/hla_reference_dna.txt.size
-rw-r--r-- 1 micknudsen staff 24 Apr 29 20:58 /Volumes/G-DRIVE/temp/indices/yara/hla_reference_dna.txt.size

For the first part of your question, here are the contents of my indices/yarafolder:

% ll /Users/micknudsen/.nextflow/assets/nf-core/hlatyping/data/indices/yara/
total 40840
-rw-r--r-- 1 micknudsen staff  4202008 Apr 29 20:32 hla_reference_dna.lf.drp
-rw-r--r-- 1 micknudsen staff        1 Apr 29 20:32 hla_reference_dna.lf.drs
-rw-r--r-- 1 micknudsen staff  8404032 Apr 29 20:32 hla_reference_dna.lf.drv
-rw-r--r-- 1 micknudsen staff       20 Apr 29 20:32 hla_reference_dna.lf.pst
-rw-r--r-- 1 micknudsen staff   186443 Apr 29 20:32 hla_reference_dna.rid.concat
-rw-r--r-- 1 micknudsen staff    44720 Apr 29 20:32 hla_reference_dna.rid.limits
-rw-r--r-- 1 micknudsen staff  4202008 Apr 29 20:32 hla_reference_dna.sa.ind
-rw-r--r-- 1 micknudsen staff        4 Apr 29 20:32 hla_reference_dna.sa.len
-rw-r--r-- 1 micknudsen staff 10114722 Apr 29 20:32 hla_reference_dna.sa.val
-rw-r--r-- 1 micknudsen staff  6398808 Apr 29 20:32 hla_reference_dna.txt.concat
-rw-r--r-- 1 micknudsen staff    89440 Apr 29 20:32 hla_reference_dna.txt.limits
-rw-r--r-- 1 micknudsen staff       24 Apr 29 20:32 hla_reference_dna.txt.size
-rw-r--r-- 1 micknudsen staff  1003576 Apr 29 20:32 hla_reference_rna.lf.drp
-rw-r--r-- 1 micknudsen staff        1 Apr 29 20:32 hla_reference_rna.lf.drs
-rw-r--r-- 1 micknudsen staff  2007168 Apr 29 20:32 hla_reference_rna.lf.drv
-rw-r--r-- 1 micknudsen staff       20 Apr 29 20:32 hla_reference_rna.lf.pst
-rw-r--r-- 1 micknudsen staff    58784 Apr 29 20:32 hla_reference_rna.rid.concat
-rw-r--r-- 1 micknudsen staff    29360 Apr 29 20:32 hla_reference_rna.rid.limits
-rw-r--r-- 1 micknudsen staff  1003576 Apr 29 20:32 hla_reference_rna.sa.ind
-rw-r--r-- 1 micknudsen staff        4 Apr 29 20:32 hla_reference_rna.sa.len
-rw-r--r-- 1 micknudsen staff  2421840 Apr 29 20:32 hla_reference_rna.sa.val
-rw-r--r-- 1 micknudsen staff  1526472 Apr 29 20:32 hla_reference_rna.txt.concat
-rw-r--r-- 1 micknudsen staff    58720 Apr 29 20:32 hla_reference_rna.txt.limits
-rw-r--r-- 1 micknudsen staff       24 Apr 29 20:32 hla_reference_rna.txt.size

This is most likely a bug:

The process yara_index uses the parameters base_index + full_index to define the full index path - whereas our checks only verify a variable index.

See here:

full_index = params.base_index + params.seqtype

One should probably fix this by making at least the variable names consistently available so that a user can overwrite them via commandline on demand.

Fixed by #79.