[BUG] importing spacy before cluster creation leads on only 1 GPU being used.
ayushdg opened this issue · 0 comments
Describe the bug
If users/NeMo-Curator imports spacy or a module that transitively imports thinc
before cluster creation it might lead to situations where only 1 of all available GPUs are used on the system.
To avoid this any modules that import thinc
should ideally be imported post cluster creation (get_client
).
Steps/Code to reproduce bug
import spacy
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
cluster = LocalCUDACluster(rmm_async=True, rmm_pool_size="2GiB")
breakpoint()
Only uses GPU0:
Note: Only reproducible as a python script not in jupyter/ipython.
Root cause seems to be this section:
https://github.com/explosion/thinc/blob/main/thinc/compat.py#L15
equivalent to the following
import cupy as cp
cp.cuda.runtime.getDeviceCount() # commenting this out works
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
cluster = LocalCUDACluster(rmm_async=True, rmm_pool_size="2GiB")
breakpoint()
Expected behavior
The core issue here is if any library that creates a primary cuda context before cluster creation similar situations may arise. Ideally libraries do not (or have an option to not) create cuda context during import, and only run checks that involve context creation at runtime.
Until then Curator might manually need to handle these imports in a separate manner, or implement some form of lazy loading.
Environment overview (please complete the following information)
- Environment location: Bare metal
- Method of NeMo-Curator install: from source
pip install .[cuda12x]
- If method of install is [Docker], provide
docker pull
&docker run
commands used