pydoit/doit

Support of LSF or GRID?

texan4ever opened this issue · 2 comments

I found DOIT while looking for alternatives to Make. (find the Make syntax to be extremely cryptic and difficult to enhance 6 months later). I am really enjoying DOIT. Powerful and easy to read.

However some of my usage involves tasks that can easily run for days and have to be run on a CPU farm.

What are the chances of DOIT being enhanced to submit the task to a CPU farm that is controlled by LSF or GRID.

Fund with Polar

That's feasible.

First step would be to have some sample code to work on.
Some example computation + pipeline you would submit to a CPU farm.
From that we can discuss about doit interface and implementation.

Can you provide some sample code of how you usually use a CPU farm?

Thanks for responding back.

LSF and GRID have commands used to submit jobs to the farm. The user would most likely need to specify the following global information at the top of the dodo.py file:

  • name of submit command (LSF=bsub, GRID=qsub)
  • Options for bsub/qsub to use for submission
  • Syntax for ram resources. Ex: -l 'mem_free=%dG'
  • Syntax for CPU resources. Ex: -pe mt %d

Memory and CPU resources:

  • User could specify the memory and CPU resources for the entire dodo.py at the top of the file
  • however, the user would most likely desire to override the global specification for individual tasks

The challenging part would be determining when a job completes. qsub returns a 0/1 if the jobs is successfully submitted and returns the jobs number to stdout:
Your job 1285583 ("") has been submitted

As a user I am able to run qstat to see what jobs are running. Or I can run qstat -j to see the status of a job (but that gives a very verbose output). So there most likely would need to be a way to specify how often the the queue is polled by doit.

Output of qstat:
job-ID prior job name user state submit/start at
1285669 0.47662 r 01/31/2022 16:19:16

Two example for Oracle GRID engine:

Example1:
submit runfoo.csh and request 5gig of ram and 8 CPU's
qsub -N 'name of job'-P name-of-queue' -cwd -V -pe mt 16 -o stdout.log -l 'os_bit=64,mem_free=50G' runfoo.csh

Example 2:
submit the linux command "tar cvfz file.tgz somedirectory". Request 1gig of ram and use default of 1CPU
NOTE: multiple -l options can be used
qsub -N 'name of job'-P 'name of queue' -cwd -V -l 'os_bit=64' -l 'mem_free=1G' "tar cvfz file.tgz somedirectory > tar.log"