neka-nat/cupoch

MemoryError cph.initialize_allocator(cph.PoolAllocation, 1000000000)

ducktrA opened this issue · 5 comments

All python examples complain about cph.PoolAllocation. Has anyone else seen that? cuda toolkit 11.6
NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6
There is plenty of GPU mem available. The 1G fits in easily. I can reduce the size heavily and same issue

traceback (most recent call last):
File "basic/benchmarks3.py", line 5, in
cph.initialize_allocator(cph.PoolAllocation, 1000000000)
MemoryError: std::bad_alloc: RMM failure at:/home/adolf/cupoch/third_party/rmm/include/rmm/mr/device/pool_memory_resource.hpp:179: Maximum pool size exceeded

Hi,
Does RMM work when used by itself?
https://docs.rapids.ai/api/rmm/stable/basics.html

import rmm

pool = rmm.mr.PoolMemoryResource(
    rmm.mr.ManagedMemoryResource(),
    initial_pool_size=2**30,
    maximum_pool_size=2**32
)
rmm.mr.set_current_device_resource(pool)

RMM itself does work when using rapids-ai RMM module. To my surprise cupoch referenced rmm module did not compile as it complained about outdated make. Somehow this doesnt feel right.

I have increased the version of RMM, so please try it with the latest master. (I don't use c++17, so it is not the latest RMM)

i did check out and built it. still the same issue. btw, the upper rmm test hit a rmm module cupoch obviously did not compile against.

i remove my manually installed rmm (pip uninstall rmm)

Traceback (most recent call last):
File "examples/python/basic/benchmarks.py", line 5, in
cph.initialize_allocator(cph.PoolAllocation, 1000000000)
MemoryError: std::bad_alloc: RMM failure at:/home/adolf/cupoch/third_party/rmm/include/rmm/mr/device/pool_memory_resource.hpp:179: Maximum pool size exceeded

same on develop branch