OpenPipe/ART

RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments during vLLM model initialization

Opened this issue · 3 comments

Description

Summary

When training a LangGraph agent with openpipe-art[backend,langgraph], the process fails at model initialization with the following error:

RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.

The error occurs inside vLLM when allocating CUDA parameters during model initialization.

Environment

  • OS: Linux
  • GPUs: 2x NVIDIA L4 (23 GB each)
  • CUDA: 12.4 (nvcc --version shows Cuda compilation tools, release 12.4, V12.4.131)
  • NVIDIA driver: 550.90.07
  • Python: 3.12.x (venv with uv)
  • Installed via: pip install openpipe-art[backend,langgraph]
  • Dependency versions (from uv.lock):
    • torch==2.7.1
    • vllm==0.10.0

Steps to reproduce

  1. Create a new Python 3.12 virtual environment.
  2. uv add openpipe-art[backend,langgraph]>=0.4.11
  3. Run training (which calls art.model.register()).
  4. Observe the crash at model initialization.

Logs

File ".../vllm/model_executor/layers/vocab_parallel_embedding.py", line 34, in init
weight = Parameter(torch.empty(sum(output_partition_sizes), ...))
RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.

Request

  • Please confirm if the current pinned torch (2.7.1) + vllm (0.10.0) combination is expected to work with CUDA 12.4 / L4 GPUs.
  • If not, could you provide a tested torch/vllm/xformers pinset for CUDA 12.4?
  • Alternatively, handle this error in vLLM (or document required versions) so users don’t hit this blocker.

Happy to provide full logs (pip freeze, nvcc, etc.) if needed.

same issue, any progress?

@du00cs I switched to skypilot backend, instead of backend, currently able to train there.

@du00cs I switched to skypilot backend, instead of backend, currently able to train there.

Thanks for great suggestion.