ropensci/targets

Push to saturated crew controllers

wlandau opened this issue · 4 comments

In crew, a controller is "saturated" if all its workers are busy. You can push a task to a saturated controller, but the task might not actually start right away. In targets, we do not push to saturated controllers because then we would have to print "start target TARGET_NAME" and give a false impression that a target is actually running right now. So the current behavior is clear and understandable for users.

However, it is in efficient to withhold tasks from controllers. When a task is withheld, it gets pushed back to the head of the queue with queue$append0(), and then internal churn ensues. In addition, one controller can block another inside a controller group if other tasks are ready and waiting.

I propose we get rid of the saturation check here:

targets/R/class_crew.R

Lines 160 to 168 in 7115068

saturated <- self$controller$saturated(controller = resources$controller)
if (saturated) {
# Requires a slow test. Covered in the saturation tests in
# tests/hpc/test-crew_local.R # nolint
# nocov start
self$scheduler$queue$append0(name = name)
self$backoff_requeue$wait()
# nocov end
} else {

To manage expectations, we can change the term "start" to "submit"/"submitted" in reporters/progress. While we are at it, we should also changed "built" to "finish"/"finished".

That said: with the exception of controller group deadlock, this particular issue will not become a bottleneck until #1183. So I would prefer to solve #1183 first.

If we have different console messages depending on the saturation of the controller, I don't think there will be a problem. One message could be "submit target x" and another could be "submit (pending) target x", depending on whether the crew controller is saturated at the time of controller$push(). I think that's clear enough.

Output from from tar_progress() would have to be the same though. That could be mildly odd at first, but the change from "started" to "submitted" should take care of most initial confusion.

Feels like I can now work on this issue without needing #1183.