Allow running multiple `art.LocalBackend` sessions on different processes

Question

Allow running multiple `art.LocalBackend` sessions on different processes

Opened this issue 2 months ago · 0 comments

Currently it's impossible to launch several LocalBackend(in_process=False) sessions simultaneously on different processes.

The current implementation of the LocalBackend performs: "pkill -9 model-service" which kills the "model-service" systemwide. As result, if I launched a LocalBackend session on the first process and started training, and then on another process I launch another LocalBackend session, it kills the run of the first session.

My use case is that I run several training scripts which all launch a LocalBackend automatically, each with their own model and setup, each with their dedicated GPUs.

Also, given the recent improvements made to closing LocalBackend (see #126), is running "pkill -9 model-service" even still necessary?

Note: it's possible to launch several sessions of LocalBackend(in_process=True) but at that case they suffer from performance penalties, since Unsloth optimizations are not applied (i.e 2-3x slowdown compared to LocalBackend(in_process=False)).