BobBuildTool/bob

Build steps can get finished with large delay in multi job mode

gruberchr opened this issue · 5 comments

If Bob runs in multi job mode (with option -j), task scheduling is active, which currently (Bob v0.23) has some unexpected side effects.

When the script of a build step is finished, this is indicated on the TTY interface by shifting the corresponding line

[    ] BUILD ...

from the stage of running jobs to the stage of finished jobs. However, this only shows that the task for running the build script has finished, but not the complete build step. To finish the build step, the audit trail has to be generated at the end (unless Bob is invoked with --no-audit). This is an independent task, indicated on the TTY interface with a separate line

[    ] AUDIT ...

Currently the tasks for audit trail generation are first scheduled after all build scripts that can be executed independently. In a configuration with many independent and long running build steps, this can be very late. Moreover this means that each of these independent build steps only finishes after all the other build scripts have finished and the tasks for audit trail generation have run.

This has at least the following observed side effects:

  1. When the build process is cancelled and resumed with option --resume, the complete build step is restarted, if the audit trail generation task did not run yet, even when the build scripts were already finished. This can be very annoying in large builds with long running build scripts.
  2. The progress indicator, which shows the number and percentage of done tasks is updated too late. It is first updated, when the build step is finished and not the build script.

Thanks for the detailed report. I'll look into it. Because this touches the core build logic it won't be an easy fix, though.

It took a bit longer than anticipated but the attached PR should hopefully fix the problem. Please give it a try on your side...

Great, the issue seems to be fixed by #550. That helps a lot!

However, I observed another issue now with PR #550. When I want to cancel Bob in multi job mode, now I have to press Ctrl+C twice. When Ctrl+C is pressed the first time, only the running jobs are canceled, but new jobs are started. Only after pressing Ctrl+C a second time, Bob is fully canceled. Moreover no output is generated after pressing Ctrl+C the second time as without PR #550 like

Build error: Canceled by user!
Run again with '--resume' to skip already built packages.

Commit 7228fb7 seems to fix this issue.

Thanks for testing.