pb-cdunn/FALCON

Incremental daligner

Opened this issue · 1 comments

It might be many weeks before this could have priority, but it's fine to discuss ideas and requirements here.

Possible questions to discuss:

  • What is the incremental input? An additional fasta file?
  • If so, then how do you specify it? In the FOFN?
  • If so, then should we run FALCON stage-0 for each fasta input?
  • What about currently running jobs? There can be a race on the DB. Can you wait for a run to finish completely before adding an extra fasta?
  • If so, should the user rm stage-1 and stage-2 himself before re-running?
  • What about via smrtlink GUI? (Suddenly, this could become a very hard problem.)

The use case is somewhat unusual. This is for an extremely large genome (17Gb), I have ~20x already and sequencing is continuing over the next few months. If I wait for all the data and run the assembly all at once it will likely take weeks, not a great use of the cluster. My plan is to incrementally run initial alignments, slowly running jobs when the cluster is under utilized. The hope is to have a complete assembly not long after sequencing finishes ideally by the beginning of September.
It is easy enough to run HPC.daligner to generate the incremental commands, but the sorting into falcon jobs is obviously more complicated. My plan right now is largely manual.