sourcegraph/src-cli

batches: add a Docker scheduler

LawnGnome opened this issue · 0 comments

From What happens when good Dockers go bad?:

We might not be able to predict the future resource usage of a container, but we can observe the usage of the containers that we’ve already started. We could use that to dynamically adjust the maximum number of parallel jobs down if it appears that we’re trending towards a memory exhaustion scenario. (Or adjust them up if there’s lots of idle CPU and free memory!)

Implementing a full blown scheduler might be overkill, but there are some very basic heuristics that we could start with here. The main drawback is that we’d probably have to slow the spawning of the initial set of containers to measure what happens (since it’s unhelpful if you start a thundering herd that immediately exhausts memory before you can do anything about it), so we’d probably only want to do this if there were a significant number of workspaces and steps.

If we do want to invest after actioning the earlier options, I think I’d want to start with something super simple. Monitor docker stats, spawn a container every couple of seconds, adjust down only based on average memory usage of each container. I don’t really want to reinvent a full blown auto-scaler here.