scverse/rapids_singlecell

How to control the memory of GPU with more than 1 million cells?

xflicsu opened this issue · 3 comments

** How to control the memory of GPU with more than 1 million cells? **

When I process the following commond (A100 40GB GPU), I caught the error of memory out.

#######################################

cudata = rsc.cunnData.cunnData(adata=adata)

/public/home/amcgc/anaconda3/envs/rapids-23.12/lib/python3.10/site-packages/rapids_singlecell/cunnData/init.py:117: FutureWarning: cunnData is deprecated, please use AnnData with cupy arrays instead. cunnData will be removed from rapids-singlecell in early 2024.

For more info on how to transition see: https://rapids-singlecell.readthedocs.io/en/latest/Usage_Principles.html
warnings.warn(


MemoryError Traceback (most recent call last)
Cell In[9], line 1
----> 1 cudata = rsc.cunnData.cunnData(adata=adata)

File ~/anaconda3/envs/rapids-23.12/lib/python3.10/site-packages/rapids_singlecell/cunnData/init.py:131, in cunnData.init(self, adata, X, obs, var, uns, layers, obsm, varm)
129 del inter
130 else:
--> 131 self._X = sparse_gpu.csr_matrix(adata.X, dtype=cp.float32)
132 self._obs = adata.obs.copy()
133 self._var = adata.var.copy()

File ~/anaconda3/envs/rapids-23.12/lib/python3.10/site-packages/cupyx/scipy/sparse/_compressed.py:227, in _compressed_sparse_matrix.init(self, arg1, shape, dtype, copy)
224 elif scipy_available and scipy.sparse.issparse(arg1):
225 # Convert scipy.sparse to cupyx.scipy.sparse
226 x = arg1.asformat(self.format)
--> 227 data = cupy.array(x.data)
228 indices = cupy.array(x.indices, dtype='i')
229 indptr = cupy.array(x.indptr, dtype='i')

File ~/anaconda3/envs/rapids-23.12/lib/python3.10/site-packages/cupy/_creation/from_data.py:46, in array(obj, dtype, copy, order, subok, ndmin)
7 def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0):
8 """Creates an array on the current device.
9
10 This function currently does not support the subok option.
(...)
44
45 """
---> 46 return _core.array(obj, dtype, copy, order, subok, ndmin)

File cupy/_core/core.pyx:2376, in cupy._core.core.array()

File cupy/_core/core.pyx:2400, in cupy._core.core.array()

File cupy/_core/core.pyx:2531, in cupy._core.core._array_default()

File cupy/_core/core.pyx:132, in cupy._core.core.ndarray.new()

File cupy/_core/core.pyx:220, in cupy._core.core._ndarray_base._init()

File cupy/cuda/memory.pyx:740, in cupy.cuda.memory.alloc()

File ~/anaconda3/envs/rapids-23.12/lib/python3.10/site-packages/rmm/allocators/cupy.py:37, in rmm_cupy_allocator(nbytes)
34 raise ModuleNotFoundError("No module named 'cupy'")
36 stream = Stream(obj=cupy.cuda.get_current_stream())
---> 37 buf = librmm.device_buffer.DeviceBuffer(size=nbytes, stream=stream)
38 dev_id = -1 if buf.ptr else cupy.cuda.device.get_device_id()
39 mem = cupy.cuda.UnownedMemory(
40 ptr=buf.ptr, size=buf.size, owner=buf, device_id=dev_id
41 )

File device_buffer.pyx:85, in rmm._lib.device_buffer.DeviceBuffer.cinit()

MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /public/home/amcgc/anaconda3/envs/rapids-23.12/include/rmm/mr/device/cuda_memory_resource.hpp

###########################################

Hey @xflicsu

Did you try using rmm?

import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
rmm.reinitialize(
    managed_memory=True, # Allows oversubscription
    pool_allocator=False, # default is False
    devices=0, # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm_cupy_allocator)

This allows for oversubscription of the gpu memory but comes at a performance cost. I would also encourage you to switch to AnnData for your workflow.

Please let me know if this helped.

Yours Severin

Hey @xflicsu

Did you try using rmm?

import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
rmm.reinitialize(
    managed_memory=True, # Allows oversubscription
    pool_allocator=False, # default is False
    devices=0, # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm_cupy_allocator)

This allows for oversubscription of the gpu memory but comes at a performance cost. I would also encourage you to switch to AnnData for your workflow.

Please let me know if this helped.

Yours Severin

Thanks for your quick response!
It works now.
BTW, whether "switch to AnnData" means to change to CPU?

Anndata supports gpu based arrays and matrices since v0.10.0. This is the new default for rsc. You can check the notebooks within the documentation to see how it's supposed to be used now.