NVIDIA/NeMo-Curator

[BUG] importing spacy before cluster creation leads on only 1 GPU being used.

ayushdg opened this issue · 0 comments

Describe the bug

If users/NeMo-Curator imports spacy or a module that transitively imports thinc before cluster creation it might lead to situations where only 1 of all available GPUs are used on the system.
To avoid this any modules that import thinc should ideally be imported post cluster creation (get_client).

Steps/Code to reproduce bug

import spacy
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
    cluster = LocalCUDACluster(rmm_async=True, rmm_pool_size="2GiB")
    breakpoint()

Only uses GPU0:
Note: Only reproducible as a python script not in jupyter/ipython.

Root cause seems to be this section:
https://github.com/explosion/thinc/blob/main/thinc/compat.py#L15
equivalent to the following

import cupy as cp
cp.cuda.runtime.getDeviceCount() # commenting this out works
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
    cluster = LocalCUDACluster(rmm_async=True, rmm_pool_size="2GiB")
    breakpoint() 

Expected behavior

The core issue here is if any library that creates a primary cuda context before cluster creation similar situations may arise. Ideally libraries do not (or have an option to not) create cuda context during import, and only run checks that involve context creation at runtime.
Until then Curator might manually need to handle these imports in a separate manner, or implement some form of lazy loading.

Environment overview (please complete the following information)

  • Environment location: Bare metal
  • Method of NeMo-Curator install: from source pip install .[cuda12x]
  • If method of install is [Docker], provide docker pull & docker run commands used