RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments during vLLM model initialization
Opened this issue · 3 comments
Description
Summary
When training a LangGraph agent with openpipe-art[backend,langgraph], the process fails at model initialization with the following error:
RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.
The error occurs inside vLLM when allocating CUDA parameters during model initialization.
Environment
- OS: Linux
- GPUs: 2x NVIDIA L4 (23 GB each)
- CUDA: 12.4 (
nvcc --versionshows Cuda compilation tools, release 12.4, V12.4.131) - NVIDIA driver: 550.90.07
- Python: 3.12.x (venv with
uv) - Installed via:
pip install openpipe-art[backend,langgraph] - Dependency versions (from uv.lock):
- torch==2.7.1
- vllm==0.10.0
Steps to reproduce
- Create a new Python 3.12 virtual environment.
uv add openpipe-art[backend,langgraph]>=0.4.11- Run training (which calls
art.model.register()). - Observe the crash at model initialization.
Logs
File ".../vllm/model_executor/layers/vocab_parallel_embedding.py", line 34, in init
weight = Parameter(torch.empty(sum(output_partition_sizes), ...))
RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.
Request
- Please confirm if the current pinned torch (2.7.1) + vllm (0.10.0) combination is expected to work with CUDA 12.4 / L4 GPUs.
- If not, could you provide a tested torch/vllm/xformers pinset for CUDA 12.4?
- Alternatively, handle this error in vLLM (or document required versions) so users don’t hit this blocker.
Happy to provide full logs (pip freeze, nvcc, etc.) if needed.
same issue, any progress?
@du00cs I switched to skypilot backend, instead of backend, currently able to train there.