fmalmeida/bacannot

Issue in KOFam database download

minhtrung1997 opened this issue Β· 13 comments

Describe the bug
when i run nextflow to download database, an issue happens

  profiles/K23070.hmm
  profiles/K23072.hmm
  profiles/prokaryote.hal
  profiles/eukaryote.hal

Command error:
  tar: profiles/K25306.hmm: Cannot change ownership to uid 500, gid 500: Operation not permitted
  tar: profiles/K25336.hmm: Cannot change ownership to uid 500, gid 500: Operation not permitted
  tar: profiles/K25338.hmm: Cannot change ownership to uid 500, gid 500: Operation not permitted

I have to download it by terminal, utilizing your script but skipp

chmod a+rw profiles.tar.gz ko_list

and

chown -R root:\$(id -g) profiles

If you want the log file for debug, please tell, I'll send it to your provided email
Thanks

Hi @minhtrung1997,

thanks for using it and reporting the problem.

can you please share with me the exact command you ran so I can try to reproduce it?

also, if you can send me the logs it will be of great help as I need to debug it.

thanks 😁

The logs will be of help as I could not reproduce the error:

email: almeidafmarques@gmail.com

> nextflow run fmalmeida/bacannot -latest -profile docker --get_dbs --output new_dbs
N E X T F L O W  ~  version 22.04.5
Pulling fmalmeida/bacannot ...
 Fast-forward
Launching `https://github.com/fmalmeida/bacannot` [magical_plateau] DSL2 - revision: 14919aead7 [master]


------------------------------------------------------
  fmalmeida/bacannot v3.2
------------------------------------------------------
Core Nextflow options
  revision       : master
  runName        : magical_plateau
  containerEngine: docker
  launchDir      : /home/falmeida/Documents
  workDir        : /home/falmeida/Documents/work
  projectDir     : /home/falmeida/.nextflow/assets/fmalmeida/bacannot
  userName       : falmeida
  profile        : docker
  configFiles    : /home/falmeida/.nextflow/assets/fmalmeida/bacannot/nextflow.config

Download databases options
  get_dbs        : true

Input/output options
  output         : new_dbs

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use fmalmeida/bacannot for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.3627669

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/fmalmeida/bacannot/blob/master/CITATIONS.md
------------------------------------------------------
executor >  local (15)
[8f/676ed7] process > CREATE_DBS:PROKKA_DB        [100%] 1 of 1 βœ”
[5e/2b64de] process > CREATE_DBS:MLST_DB          [100%] 1 of 1 βœ”
[32/4e0c39] process > CREATE_DBS:KOFAMSCAN_DB     [100%] 1 of 1 βœ”
[5e/9c56dc] process > CREATE_DBS:CARD_DB          [100%] 1 of 1 βœ”
[c4/73c546] process > CREATE_DBS:RESFINDER_DB     [100%] 1 of 1 βœ”
[2d/f540f8] process > CREATE_DBS:AMRFINDER_DB     [100%] 1 of 1 βœ”
[37/6fef99] process > CREATE_DBS:ARGMINER_DB      [100%] 1 of 1 βœ”
[fa/e4b5d1] process > CREATE_DBS:PLATON_DB        [100%] 1 of 1 βœ”
[fe/c47052] process > CREATE_DBS:PLASMIDFINDER_DB [100%] 1 of 1 βœ”
[2f/759348] process > CREATE_DBS:PHIGARO_DB       [100%] 1 of 1 βœ”
[06/b7f6c3] process > CREATE_DBS:PHAST_DB         [100%] 1 of 1 βœ”
[ef/4b77d7] process > CREATE_DBS:VFDB_DB          [100%] 1 of 1 βœ”
[f3/044bf7] process > CREATE_DBS:VICTORS_DB       [100%] 1 of 1 βœ”
[e3/b05a37] process > CREATE_DBS:ICEBERG_DB       [100%] 1 of 1 βœ”
[22/947111] process > CREATE_DBS:ANTISMASH_DB     [100%] 1 of 1 βœ”

Completed at: 10-Jan-2023 12:06:13
Duration    : 2h 32m 40s
CPU hours   : 8.2
Succeeded   : 15

Thanks, Unfortunately, I've lost the lof of that run,
Anyway, there still 1 problem at the Resfinder step (when running test on ecoli sample) that may connected to the previous I reported

Error executing process > 'BACANNOT:RESFINDER (ecoli_2)'

Caused by:
  Process `BACANNOT:RESFINDER (ecoli_2)` terminated with an error exit status (1)

Command executed:

  # activate env
  source activate resfinder
  
  # Make databases available
  # ln -rs Database-bacannot/resfinder_db/db_* $(dirname $(which run_resfinder.py))
  
  # Run resfinder acquired resistance
  run_resfinder.py \
      --inputfasta ecoli_2.fna -db_res Database-bacannot/resfinder_db/db_resfinder \
      -o resfinder \
      --species "Escherichia coli" \
      --min_cov  0.9 \
      --threshold 0.9 \
      --acquired ;
  
  # Fix name of pheno table
  mv resfinder/pheno_table.txt resfinder/args_pheno_table.txt &> /dev/null ;
  
  # Run resfinder pointfinder resistance
  run_resfinder.py \
      --inputfasta ecoli_2.fna -db_point Database-bacannot/resfinder_db/db_pointfinder \
      -o resfinder \
      --species "Escherichia coli" \
      --min_cov  0.9 \
      --threshold 0.9 \
      --point ;
  
  # Fix name of pheno table
  mv resfinder/pheno_table.txt resfinder/mutation_pheno_table.txt &> /dev/null ;
  
  # Convert to GFF
  resfinder2gff.py \
      -i resfinder/ResFinder_results_tab.txt > resfinder/results_tab.gff ;

Command exit status:
  1

Command output:
  (empty)

Command error:
  Could not locate ResFinder database path: /opt/conda/envs/resfinder/bin/db_resfinder

Work dir:
  /media/NAS_SHARE/test/WGS/BAC_ANNOT/bacannot/work/87/0b8b5c60387c2361cae5d88fe80323

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

I've modified your code a bit because, like the KOFam above, The operation not permitted (log1.txt). Then when I specify the path and comment the symlink, it gotta error like log2.txt

log1.txt
log2.txt

Okay, so the problem seems to be appearing when using singularity. The only difficult on debugging it is that I have very limited access to machines with singularity but I will try my best.

If we just remove the symlink we would have to point the resfinder tool to another place just as you did with --db_res that would be the best option ... however, for some reason, the resfinder tool still tries to find it in its bin even tough we point to another place ... and there is nothing we can do about it unless it is fixed in the tool itself.

The symlink was the option I found that at least allowed it to run.

I will try to find a machine were I can run it with singularity so I can debug it.

To make sure, this error only appeared for Resfinder and KOfam, right?

Till this point is yes :)). I've not finished running the test.

I don't know if your image containing resfinder command
singularity shell work/singularity/fmalmeida-bacannot-v3.2_misc.img
image

I know it is there because it is just a conversion from the Docker image. I just don't know how NF is actually mounting it so the tools remain available. Maybe there is something more than just calling shell.

I will try to get access to a machine with singularity by tomorrow so I can run it and debug. I will must probably be able to reproduce the error now that I know you are using the singularity profile and it is totally related to it because the docker profile is working good.

I will get this access and start debugging it. Maybe I will need to change the image a little bit or add some optional snippets for when using singularity. But, we will get this fixed :)

I long to hear from you !!!

Hi @minhtrung1997,
Just to let you know that I am still trying to find a machine were I can try the singularity execution. Hopefully by next week I can have some update.

Hi @minhtrung1997 ,
I am happy to say that I found a machine and could reproduce the errors. The error for resfinder just required a very small change in the available image.

I did the change and it is working, you can try to delete your singularity cache and images and run again so the pipeline fetches the new available image for that.

However, I still did not find time to debug the error with the database download. Since you already have the databases and can run the analysis, I will take care of that more calmly along this week :)

Hi @minhtrung1997,
I am struggling a bit to have this one fixed but am indeed looking at it.
One thing that came to my mind is that maybe it also makes sense to provide the datases as a .tar.gz that can be downloaded from somewhere if users don't want to run it.
Do you think it makes sense for it and would it be a good alternative for such cases? At least for singularity while this fix do not come up?

I am really appreciate and thankful for the pipeline and the idea, also the way your team support issue. So I think, to me download dataset is a one-time job and provided a dataset in tar.gz is OK.

Anyway, I think it's better if we could incorporate MOB-suite and integron finder 2 to the collection, as those tools have really good review and also are recommended by our senior. If you team could help it, I long to hear notification from you. Once again, thank you very much❀

Great to hear that.
So, I will work on having this tar.gz and also I will create a few issues to have this planned and tackled. I will also tag you there so any desired input can be given.
Integron finder 2.0 was already in the plans, see: #50
also, I don't see any reason why MOB-suite could not be added, so I will also create a ticket for that.

The main problem here is, I don't have usually much time to do these enhancements so they may take a while ... But, if you feel comfortable enough to have a fork and help adding these tools, it will indeed be very much appreciated and welcomed.

It can take a while because it needs to address these steps:

  • tool has to be added to docker images
  • has to produce a tabular file that can be merged in the final GFF file
  • has to have it's results added to the HTML reports
  • has to have it's results added to JBrowse.

But I will open the issues so at least these enhancements can start to happen.