broadinstitute/CellBender

WISH: Please respect CGroups CPU limits for parallelization, or default to single-core processing

Opened this issue · 0 comments

CellBender lets pytorch use all CPU cores on the host by default, per https://cellbender.readthedocs.io/en/latest/reference/index.html:

--cpu-threads

    Number of threads to use when pytorch is run on CPU. Defaults to the number of logical cores.

Where I'm coming from: I think any software tool that may run in a multi-user environment should default to single-threaded/single-process processing. Defaulting to all CPU cores on the host wreaks havoc when there are other users or processes running at the same time. So, I argue single-core processing is the safest default. If a user has access to more CPU resource, then can actively request it.

If a software tool really wants to run in parallel by default, I argue it should be done by respecting what the system has allotted to the process. CGroups CPU limits is one such allocation, which becomes more and more common these days - you see it in HPC environments, in cloud services, etc. If CGroups limits each user/session/process to, say, four cores, and the machine has 192 cores, the current behavior of CellBender causes it to run 192 threads that are competing for 4 cores, resulting in a major context switching and slowdown. Many users are unaware that this is happening, so they suffer from this already now, and may conclude that CellBender is slower than it actually is.

I also think it should be limited to a fixed amount of CPU cores; as hosts get more and more cores these days, defaulting to the number of hardware CPU cores risks becoming too large as we have machines with 192 and 256 cores these days, resulting in a slowdown rather than a speed-up.

PS. This is based on observations in an HPC environment with 1000+ different users.