flux-framework/flux-workflow-examples

semantics of arguments to `example7` and `example10`

Closed this issue · 4 comments

I only caught this when running them back-to-back, but when you run bookkeeper.py 5 in example7, it actually runs 10 jobs (5 of each). When you run any of the scripts in example 10 with 10 as the argument, then 10 jobs get run (5 of each). I think the semantics of bookkeeper should be changed (if possible) to match those of the waits in example10.

Thanks for pointing this out @SteVwonder! I agree, the semantics for this could definitely be made clearer.

On another note on example7: what version of Flux are you running this example against? I just ran this example against flux-core v0.14 and it looks like job_state_cb() never gets called. Here's my output:

bash-4.2$ ./bookkeeper.py 3
187133067264
187468611584
187888041984
188324249600
188726902784
189230219264
bookkeeper: all jobs submitted
bookkeeper: waiting until all jobs complete
bookkeeper: all jobs completed

However, the jobs will run for their set time (120 seconds in this case):

bash-4.2$ flux jobs
             JOBID USER     NAME       STATE    NTASKS TIME
      189230219264 moussa1  io-forward RUN           3 2.8s
      188726902784 moussa1  compute.py RUN           6 2.9s
      188324249600 moussa1  io-forward RUN           3 2.9s
      187888041984 moussa1  compute.py RUN           6 2.9s
      187468611584 moussa1  io-forward RUN           3 2.9s
      187133067264 moussa1  compute.py RUN           6 3s

I know that Frank D made some changes to example 7 to make it work under the latest version of Flux. Here is the final script we ended up with for the tutorial: https://github.com/flux-framework/Tutorials/pull/3/files#diff-3c1bff2672c3ffd8cba68cb8f102cdce

Does that fix the issue?

Just ran it - still got that same output listed as above. Maybe it could have something to do with the Flux version I am running against? I am running it against a locally-installed version of Flux. I can open a separate issue for it.

PR #61 has addressed this issue; I'll go ahead and close this. 👍