jodyphelan/NTM-Profiler

Can't run after update

Opened this issue · 2 comments

Thanks for letting me know about the update Jody, unfortunately all of my runs fail with this error that I cannot decode (the version says 0.2.0, but I confirmed I'm running 0.2.1 but the version string wasn't updated with the latest release):

$ cat N200006_S289.errlog.txt

ntm-profiler error report

  • OS: linux
  • ntm-profiler version: 0.2.0
  • pathogen-profiler version: 2.0.0
  • Program call:
{'no_clean': False, 'read1': '/home/pgc29/scratch60/Taiwan_MKansasii/dataraw/N200006_S289_R1.fastq.gz', 'read2': '/home/pgc29/scratch60/Taiwan_MKansasii/dataraw/N200006_S289_R2.fastq.gz', 'bam': None, 'fasta': None, 'vcf': None, 'platform': 'illumina', 'resistance_db': None, 'external_resistance_db': None, 'species_db': 'ntmdb', 'external_species_db': None, 'prefix': 'N200006_S289', 'dir': '.', 'csv': False, 'txt': False, 'add_columns': None, 'add_mutation_metadata': False, 'call_whole_genome': False, 'mapper': 'bwa', 'caller': 'freebayes', 'calling_params': None, 'min_depth': 10, 'af': 0.1, 'reporting_af': 0.1, 'coverage_fraction_threshold': 0, 'missing_cov_threshold': None, 'species_only': False, 'no_trim': False, 'no_flagstat': False, 'no_clip': True, 'no_delly': False, 'no_species': False, 'no_mash': False, 'output_kmer_counts': False, 'add_variant_annotations': False, 'threads': 1, 'verbose': 0, 'no_cleanup': False, 'delly_vcf': None, 'func': <function main_profile at 0x2b25417c41f0>, 'software_name': 'ntm-profiler', 'tmp_prefix': 'abdb8f1d-40f4-4030-92c7-72565c0a47f0', 'files_prefix': './abdb8f1d-40f4-4030-92c7-72565c0a47f0'}

Traceback:

  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/bin/ntm-profiler", line 322, in <module>
    args.func(args)
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/bin/ntm-profiler", line 89, in main_profile
    species_prediction = pp.speciate(args)
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/lib/python3.8/site-packages/pathogenprofiler/cli.py", line 64, in speciate
    kmer_dump = fastq_class.get_kmer_counts(args.files_prefix,threads=args.threads)
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/lib/python3.8/site-packages/pathogenprofiler/fastq.py", line 127, in get_kmer_counts
    run_cmd(f"kmc {bins} -t{threads} -sf{threads} -sp{threads} -sr{threads} -k{klen} @{tmp_file_list} {tmp_prefix} {tmp_prefix}")
  File "/gpfs/ysm/project/cohen_theodore/pgc29/conda_envs/ntmprofiler2/lib/python3.8/site-packages/pathogenprofiler/utils.py", line 391, in run_cmd
    raise ValueError("Command Failed:\n%s\nstderr:\n%s" % (cmd,stderr.decode()))

Value:

Command Failed:
set -u pipefail; kmc  -t1 -sf1 -sp1 -sr1 -k31 @219bda85-28f9-43c3-9f0c-461bc10d96e1.list 219bda85-28f9-43c3-9f0c-461bc10d96e1 219bda85-28f9-43c3-9f0c-461bc10d96e1
stderr:
*****************
Stage 1: 100%
Stage 2: 100%
/bin/sh: line 1: 150965 Bus error               kmc -t1 -sf1 -sp1 -sr1 -k31 @219bda85-28f9-43c3-9f0c-461bc10d96e1.list 219bda85-28f9-43c3-9f0c-461bc10d96e1 219bda85-28f9-43c3-9f0c-461bc10d96e1

I tried to run the kmc command by itself and got the same error

$ kmc  -t1 -sf1 -sp1 -sr1 -k31 @219bda85-28f9-43c3-9f0c-461bc10d96e1.list 219bda85-28f9-43c3-9f0c-461bc10d96e1 219bda85-28f9-43c3-9f0c-461bc10d96e1
*****************
Stage 1: 100%
Stage 2: 100%
Bus error

This is on my university's cluster and not my own machine, so a bit harder to debug, but I'll keep looking into it.

Ah, it's an out of memory issue. My cluster instance had a cap of 8GB of memory and the default for kmc is 12GB. Per kmc's documentation -m<size> - max amount of RAM in GB (from 1 to 1024); default: 12 so adding the argument -m8 made it run fine. kmc needs a minimum of 2GB of memory, but when comparing different runs, it takes about 45 seconds per genome with both 2GB and 8GB, so maybe set the command to -m2. Alternatively you could just warn users to allocate at least 12GB.

Ah thanks for looking into this, I'll add that parameter in and update the release