PacificBiosciences/FALCON

FALCON on PBSPro 0-rawreads/tan-runs/tan_003 fails with datander error

Opened this issue · 1 comments

I have the same issue as @amnag and others. I read the comments and found no path to a solution. I checked the task-run directory /0-rawreads/tan-runs/tan_003 for the stderr : it looks like datander gets killed. How do I prevent this from happening?

I have a fresh install of pb-falcon, pb-dazzler and pbmm2 from bioconda on linux 64.

The critical error is with pypeflow 0-rawreads/tan-runs/tan_003/task.json ... the file looks like it's got some weird placeholders. I'm guessing there's a config error ; hours spent ploughing through various versions of config files posted on github, have just brought more confusion.

### task.json 
{
    "bash_template_fn": "template.sh",
    "inputs": {
        "bash_template": "../../tan-split/bash_template.sh",
        "units_of_work": "../../tan-chunks/tan_003/some-units-of-work.json"
    },
    "outputs": {
        "results": "some-done-files.json"
    },
    "parameters": {
        "pypeflow_mb": 4000,
        "pypeflow_nproc": 4
    }

This is the stderr output of tan_003:

### tan_003 stderr
[INFO]$('bash -vex run_datander.sh')
module () {  eval `/cm/local/apps/environment-modules/3.2.10/Modules/$MODULE_VERSION/bin/modulecmd bash $*`
}
datander -v -k16 -w7 -e0.7 -P. raw_reads.13 raw_reads.14
+ datander -v -k16 -w7 -e0.7 -P. raw_reads.13 raw_reads.14
run_datander.sh: line 1: 22873 Killed                  datander -v -k16 -w7 -e0.7 -P. raw_reads.13 raw_reads.14
[WARNING]Call 'bash -vex run_datander.sh' returned 35072.
Traceback (most recent call last):
  File "/ngs/software/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/ngs/software/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/falcon_kit/mains/dazzler.py", line 1528, in <module>
    main()
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/falcon_kit/mains/dazzler.py", line 1524, in main
    args.func(args)
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/falcon_kit/mains/dazzler.py", line 1053, in cmd_tan_apply
    tan_apply(args.db_fn, args.script_fn, args.job_done_fn)
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/falcon_kit/mains/dazzler.py", line 394, in tan_apply
    io.syscall('bash -vex {}'.format(os.path.basename(script_fn)))
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call 'bash -vex run_datander.sh' returned 35072.
WARNING:root:Call '/bin/bash user_script.sh' returned 256.
INFO:root:CD: 'uow-00' -> '/ngs/data/falcon/0-rawreads/tan-runs/tan_003'
Traceback (most recent call last):
  File "/ngs/software/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/ngs/software/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 117, in <module>
    main()
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 113, in main
    run(**vars(args))
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/falcon_kit/mains/generic_run_units_of_work.py", line 66, in run
    pypeflow.do_task.run_bash(script, inputs, outputs, params)
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/do_task.py", line 201, in run_bash
    util.system(cmd)
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call '/bin/bash user_script.sh' returned 256.
2019-12-21 21:52:34,788 - root - WARNING - Call '/bin/bash user_script.sh' returned 256.
2019-12-21 21:52:34,823 - root - INFO - CD: '/ngs/data/falcon/0-rawreads/tan-runs/tan_003' -> '/ngs/data/falcon/0-rawreads/tan-runs/tan_003'
2019-12-21 21:52:34,823 - root - INFO - CD: '/ngs/data/falcon/0-rawreads/tan-runs/tan_003' -> '/ngs/data/falcon/0-rawreads/tan-runs/tan_003'
2019-12-21 21:52:34,898 - root - CRITICAL - Error in /ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/do_task.py with args="{'json_fn': '/ngs/data/falcon/0-rawreads/tan-runs/tan_003/task.json',\n 'timeout': 30,\n 'tmpdir': None}"
Traceback (most recent call last):
  File "/ngs/software/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/ngs/software/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/do_task.py", line 281, in <module>
    main()
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/do_task.py", line 273, in main
    run(**vars(parsed_args))
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/do_task.py", line 267, in run
    run_cfg_in_tmpdir(cfg, tmpdir, '.')
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/do_task.py", line 242, in run_cfg_in_tmpdir
    run_bash(bash_template, myinputs, myoutputs, parameters)
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/do_task.py", line 201, in run_bash
    util.system(cmd)
  File "/ngs/software/miniconda2/lib/python2.7/site-packages/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call '/bin/bash user_script.sh' returned 256.
+++ pwd
++ echo 'FAILURE. Running top in /ngs/data/falcon/0-rawreads/tan-runs/tan_003 (If you see -terminal database is inaccessible- you are using the python bin-wrapper,
 so you will not get diagnostic info. No big deal. This process is crashing anyway.)'
++ rm -f top.txt
++ which python
++ which top
++ env -u LD_LIBRARY_PATH top -b -n 1
++ env -u LD_LIBRARY_PATH top -b -n 1
++ pstree -apl

real	1m5.638s
user	0m1.711s
sys	0m10.342s
+ finish
+ echo 'finish code: 1'

My config file:

[job.defaults]
job_type = PBS
njobs = 8
pwatcher_type = blocking
submit = qsub -S /bin/bash -V -q ${JOB_QUEUE} \
-N ${JOB_ID} \
-m abe \
-o "${STDOUT_FILE}" \
-e "${STDERR_FILE}" \
-l select=1:mem=2gb:ncpus=8 \
-W block=true "${CMD}"

JOB_QUEUE = workq

[General]
input_fofn = input.fofn
#input_fofn = preads.fofn

input_type = raw
#input_type = preads

#Falcon-Unzip will be run subsequently and therefore we were recommend to run the Falcon job with 40-fold seed coverage per haplotype, so 80-fold seed coverage in the diploid case. 
# (integer) minimum length of seed-reads used for pre-assembly stage
# If '-1', then auto-calculate the cutoff based on genome_size and seed_coverage.
length_cutoff = -1
seed_coverage = 80
genome_size = 40000000

# (integer) minimum length of seed-reads used after pre-assembly, for the "overlap" stage
# value recommended for 40Mb fungi genome = 8000
length_cutoff_pr = 5000

### START default values ##########################
pa_daligner_option = -k16 -h35 -w7 -e.70 -l40 -s100
# Added into HPCdaligner, HPCTANmask, HPCREPmask.
ovlp_daligner_option = -k15 -h60 -w6 -e.95 -l40 -s100
# Added into HPCdaligner.
# (There is currently no HPCTANmask, HPCREPmask for ovlp (stage-1).)

pa_HPCdaligner_option = -v -D24
# Passed to `HPCdaligner` (after `pa_daligner_option`) during pre-assembly stage.
# We will add `-H` based on "length_cutoff".
ovlp_HPCdaligner_option = -v -D24 -l500
# Passed to `HPC.daligner` during overlap stage.

pa_DBsplit_option = -x250 -s500 -a
# Passed to `DBsplit` during pre-assembly stage.
ovlp_DBsplit_option = -s50 -a
# Passed to `DBsplit` during overlap stage.
### END default values ##########################

falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 8
overlap_filtering_setting = --max-diff 120 --max-cov 120 --min-cov 2 --n-core 12

For the lost souls who land here: I used the defaults from https://github.com/PacificBiosciences/pb-assembly and got past the tan_003 error (job failed at daligner-runs/j_0036 ...)