marbl/canu

Using Canu in a pipeline

Closed this issue · 4 comments

I am trying to use Canu in a pipeline and am struggling due to the fact that it submits its own jobs. I am using Gitbash and submitting job scripts on an HPC system. Here is the Canu command I am using, and it works great on its own:

canu -p asm genomeSize=260m gridOptions="-p normal -A mwf" -pacbio-hifi *.fastq.gz

There are no issues there. However, my script moves on to the next commands before the assembly file is generated because the jobs that Canu submits are still running. For example, if I have this command after my Canu command, purge_dups will fail because asm.contigs.fasta does not exist yet:

python $HOME/local/purge_dups/scripts/pd_config.py asm.contigs.fasta purge.fofn

I need to pause the rest of my script until Canu's external jobs complete and the .fasta file exists. I have tried using the wait command a few ways to no avail. Any insight would be greatly appreciated!

Canu supports an onSuccess option which can be an arbitrary script to run after the last jobs complete. There is also an onFailure to run if the assembly does not complete. You can provide the continuation of your script as these two options and they won't run until after the assembly is complete.

Ok great. Can you tell me how to properly utilize the parameter? I have been trying the following with the rest of my script in an executable .sh file, and while canu is running properly and all of its jobs finish without error, nothing seems to run after the fact.

canu -p asm genomeSize=260m gridOptions="-p normal -A mwf" "onSuccess=srun test.sh" -pacbio-hifi *.fastq.gz

Any idea what is wrong here? I tried adding onFailure too and moving quotations around. The srun test.sh command runs fine if put directly into my original script, but the canu command doesn't seem to be running it with my current command.

I expect Canu is trying to run but is having an error which should be logged in the canu.out file. The command is run from inside the Canu assembly folder so I would guess it can't find test.sh and you need to provide the full path to it instead.

Idle