theislab/batchglm

Stop using multiprocessing for fitting dispersion models earlier

le-ander opened this issue · 1 comments

From experience, I can say that the multiprocessing overhead when fitting dispersion models seems to be a lot larger than the code is currently written for.

ie. fitting the last 50 or so models still takes a long time and as soon as multiprocessing is switched off for the last models, things become a lot faster. maybe multiprocessing could be only used when there are more than 10x as many genes left than processors here:

if nproc > 1 and len(idx_update) > nproc:

So something like:
if nproc > 1 and len(idx_update) > 10 * nproc:

To provide some numbers: on 8 cores the last iteration where multiprocessing is used (fitting like 9 or 10 genes) takes 16s, the next iteration (no multiprocessing, so 7-8 genes) takes 2s