Training freezes at 0% "train" after "gather" stage is complete

Question

Training freezes at 0% "train" after "gather" stage is complete

Opened this issue 3 months ago · 18 comments

Running locally ART's 2048.ipynb notebook in Docker, skipping the first cell.
Training freezes at 0% "train" after the "gather" stage is complete. GPU utilization is at 0% in nvidia-smi.
Unsloth's Qwen3 GRPO notebook (without the use of ART) works as expected, training in it doesn't freeze.
NVIDIA RTX 5060 Ti
Dockerfile:

FROM quay.io/jupyter/pytorch-notebook:cuda12-python-3.12
USER root
RUN apt-get update && apt-get install -y build-essential
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
RUN dpkg -i cuda-keyring_1.1-1_all.deb
RUN apt update && apt install -y cuda-toolkit
USER jovyan

RUN pip install openpipe-art==0.4.7 openpipe-art[backend]==0.4.7 --extra-index-url https://download.pytorch.org/whl/cu128 --extra-index-url https://wheels.vllm.ai/nightly

# Blackwell fix:
RUN pip uninstall -y xformers
RUN git clone --depth=1 https://github.com/facebookresearch/xformers --recursive && cd xformers && export TORCH_CUDA_ARCH_LIST="12.0" && python setup.py install

Output from the training cell:

gather: 100%
 18/18 [01:35<00:00,  3.23s/it, reward=1.19, max_value=102, board_value=187, move_number=82.4, completion_tokens=21.8]

WARNING:weave.trace.op:Warning: Traces will not be logged. Call weave.init to log your traces to a project.
 (subsequent messages of this type will be suppressed)

No "val/reward" metric found in history
Packed 18 trajectories into 15 sequences of length 6144

train:   0%
 0/15 [00:00<?, ?it/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,000,000 | Num Epochs = 3 | Total steps = 30,000,000
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 1 x 1) = 2
 "-____-"     Trainable parameters = 14,966,784 of 3,100,905,472 (0.48% trained)

Unsloth: Will smartly offload gradients to save VRAM!

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,000,000 | Num Epochs = 3 | Total steps = 60,000,000
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 1 x 1) = 1
 "-____-"     Trainable parameters = 14,966,784 of 3,100,905,472 (0.48% trained)

Any help is appreciated.

Answer 1 · 2025-08-20T00:54:45.000Z

I pushed something earlier today that may help address the hanging you're seeing. To get the latest version of openpipe-art:

uv add 'git+https://github.com/OpenPipe/ART.git#egg=openpipe-art[backend]'

Answer 2 · 2025-08-20T17:51:21.000Z

I pushed something earlier today that may help address the hanging you're seeing. To get the latest version of openpipe-art:
uv add 'git+https://github.com/OpenPipe/ART.git#egg=openpipe-art[backend]'

No, unfortunately, it doesn't seem to change anything for me

Answer 3 · 2025-08-21T02:13:18.000Z

Are you trying to run this on a B200? I'm not sure we support Blackwell yet.

Answer 4 · 2025-08-21T15:22:32.000Z

Are you trying to run this on a B200? I'm not sure we support Blackwell yet.

Haha, I wish I had a B200. No, I use an NVIDIA RTX 5060 Ti.
Unsloth supports Blackwell and I was under the impression that ART GPU support is basically tied to Unsloth GPU support, since ART uses Unsloth under the hood.
Is that not the case?

Answer 5 · 2025-08-22T21:04:10.000Z

openpipe-art 0.4.4
GPU H200 x2

I unfortunately encounter the same phenomenon quite regularly.
As far as i can tell, in my cases it results from LLMEngine crashes.
Important to note is that I use two clones of the model to perform rollouts simultaneously on 2 GPUS, but these crashes always occur on the vLLM backed from the art library (the second vLLM instance i start manually and sync its LoRA weights as the training goes).

Example crash logs from logs/vllm.log:

I use Qwen-3 14B, thinking disabled.
For log1 and log2 the program got stuck at training 0%, for log 3 the program got stuck at rollout 0%. Note that during rollout I use max_exceptions=4 so its possible the exception for logs 1 and 2 occurred during rollout, but max_exceptions>0 allowed the rollout to finish even though the engine itself crashed, and then it got stuck at the training stage.

Also important to note is that it happens (when it does) in the middle of the training and not in the beginning, so its pretty hard to reproduce. From my experience running model on CPU should display the actual error clearly (instead of the cryptic CUDA stuff), but as it happens after a while, this is quite infeasible.

Log 1

...

INFO 08-22 14:42:17 [metrics.py:433] Prefix cache hit rate: GPU: 83.33%, CPU: 0.00%
INFO 08-22 14:42:22 [metrics.py:417] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 338.3 tokens/s, Running: 5 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 7.1%, CPU KV cache usage: 0.0%.
INFO 08-22 14:42:22 [metrics.py:433] Prefix cache hit rate: GPU: 83.33%, CPU: 0.00%
INFO 08-22 14:42:27 [metrics.py:417] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 334.9 tokens/s, Running: 5 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 7.5%, CPU KV cache usage: 0.0%.
INFO 08-22 14:42:27 [metrics.py:433] Prefix cache hit rate: GPU: 83.33%, CPU: 0.00%
�[32mINFO�[0m:     127.0.0.1:60158 - "POST /v1/chat/completions HTTP/1.1" 200
ERROR 08-22 14:42:34 [async_llm_engine.py:67] Engine background task failed
Traceback (most recent call last):
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 57, in _log_task_completion
   return_value = task.result()
                  ^^^^^^^^^^^^^
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
   raise self._exception.with_traceback(self._exception_tb)
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 314, in __step_run_and_handle_result
   result = coro.send(None)
            ^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 834, in run_engine_loop
   result = task.result()
            ^^^^^^^^^^^^^
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
   raise self._exception.with_traceback(self._exception_tb)
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 316, in __step_run_and_handle_result
   result = coro.throw(exc)
            ^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/art/vllm/engine.py", line 75, in engine_step
   return await _engine_step(virtual_engine)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 757, in engine_step
   request_outputs = await self.engine.step_async(virtual_engine)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 355, in step_async
   outputs = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 266, in execute_model_async
   output = await make_async(self.execute_model)(execute_model_req)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 289, in __await__
   yield self  # This tells Task to wait for completion.
   ^^^^^^^^^^
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 385, in __wakeup
   future.result()
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
   raise self._exception.with_traceback(self._exception_tb)
 File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
   result = self.fn(*self.args, **self.kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 141, in execute_model
   output = self.collective_rpc("execute_model",
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
   answer = run_method(self.driver_worker, method, args, kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/utils.py", line 2671, in run_method
   return func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 421, in execute_model
   output = self.model_runner.execute_model(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
   return func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 593, in execute_model
   outputs = self._final_process_outputs(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 437, in _final_process_outputs
   output.pythonize(model_input, self._copy_stream,
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 101, in pythonize
   self._pythonize_sampler_output(input_metadata, copy_stream,
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 129, in _pythonize_sampler_output
   self.sampler_output_ready_event.synchronize()
 File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/torch/cuda/streams.py", line 227, in synchronize
   super().synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

�[32mINFO�[0m:     127.0.0.1:60136 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     127.0.0.1:60498 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     127.0.0.1:60508 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     127.0.0.1:60520 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     Shutting down
�[32mINFO�[0m:     Waiting for application shutdown.
�[32mINFO�[0m:     Application shutdown complete.
�[32mINFO�[0m:     Finished server process [�[36m1881623�[0m]

Log 2

INFO 08-15 11:28:41 [metrics.py:417] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 333.7 tokens/s, Running: 4 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 3.1%, CPU KV cache usage: 0.0%.
INFO 08-15 11:28:41 [metrics.py:433] Prefix cache hit rate: GPU: 69.38%, CPU: 0.00%
ERROR 08-15 11:28:41 [async_llm_engine.py:67] Engine background task failed
Traceback (most recent call last):
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 57, in _log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 314, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 834, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 316, in __step_run_and_handle_result
    result = coro.throw(exc)
             ^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/art/vllm/engine.py", line 75, in engine_step
    return await _engine_step(virtual_engine)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 757, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 355, in step_async
    outputs = await self.model_executor.execute_model_async(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 266, in execute_model_async
    output = await make_async(self.execute_model)(execute_model_req)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 289, in __await__
    yield self  # This tells Task to wait for completion.
    ^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 385, in __wakeup
    future.result()
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 141, in execute_model
    output = self.collective_rpc("execute_model",
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/utils.py", line 2671, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 421, in execute_model
    output = self.model_runner.execute_model(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 593, in execute_model
    outputs = self._final_process_outputs(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 437, in _final_process_outputs
    output.pythonize(model_input, self._copy_stream,
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 101, in pythonize
    self._pythonize_sampler_output(input_metadata, copy_stream,
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 131, in _pythonize_sampler_output
    _pythonize_sampler_output(input_metadata, self.sampler_output,
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 823, in _pythonize_sampler_output
    ) = (deferred_pythonize_logprobs(output, sampling_metadata,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 722, in deferred_pythonize_logprobs
    ) = get_logprobs(logprobs_tensor, sampling_metadata, sampler_result)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/sampler.py", line 902, in get_logprobs
    selected_logprobs = selected_logprobs.to('cpu')
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

�[32mINFO�[0m:     127.0.0.1:48508 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     127.0.0.1:52710 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     127.0.0.1:52440 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     127.0.0.1:52392 - "POST /v1/chat/completions HTTP/1.1" 500
�[32mINFO�[0m:     Shutting down
�[32mINFO�[0m:     Waiting for application shutdown.
�[32mINFO�[0m:     Application shutdown complete.
�[32mINFO�[0m:     Finished server process [�[36m1937525�[0m]

Log 3

INFO 08-08 19:20:00 [metrics.py:417] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 69.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.8%, CPU KV cache usage: 0.0%.
INFO 08-08 19:20:00 [metrics.py:433] Prefix cache hit rate: GPU: 91.78%, CPU: 0.00%
�[32mINFO�[0m:     127.0.0.1:35408 - "POST /v1/chat/completions HTTP/1.1" 200
ERROR 08-08 19:20:03 [async_llm_engine.py:67] Engine background task failed
Traceback (most recent call last):
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 57, in _log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 314, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 834, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 316, in __step_run_and_handle_result
    result = coro.throw(exc)
             ^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/art/vllm/engine.py", line 75, in engine_step
    return await _engine_step(virtual_engine)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 757, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 355, in step_async
    outputs = await self.model_executor.execute_model_async(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 266, in execute_model_async
    output = await make_async(self.execute_model)(execute_model_req)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 289, in __await__
    yield self  # This tells Task to wait for completion.
    ^^^^^^^^^^
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 385, in __wakeup
    future.result()
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/futures.py", line 202, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/home/fre.gilad/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 141, in execute_model
    output = self.collective_rpc("execute_model",
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/utils.py", line 2671, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 421, in execute_model
    output = self.model_runner.execute_model(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 593, in execute_model
    outputs = self._final_process_outputs(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 437, in _final_process_outputs
    output.pythonize(model_input, self._copy_stream,
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 101, in pythonize
    self._pythonize_sampler_output(input_metadata, copy_stream,
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/vllm/worker/multi_step_model_runner.py", line 129, in _pythonize_sampler_output
    self.sampler_output_ready_event.synchronize()
  File "/home/fre.gilad/source/AgentDaC/.venv/lib/python3.12/site-packages/torch/cuda/streams.py", line 227, in synchronize
    super().synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Answer 6 · 2025-08-23T08:50:39.000Z

As far as i can tell, in my cases it results from LLMEngine crashes.

I think you might have a different problem. I just checked my vllm.log and there are no crashes or errors. Everything is normal until the training stage begins; after that there are no new log lines and the training remains stuck at 0%.

Answer 7 · 2025-08-23T22:52:45.000Z

I have the same issue. I'm using RTX A6000

Answer 8 · 2025-08-26T19:31:29.000Z

I've rechecked this issue with ART 0.4.9, just in case, but there is no improvement so far.
I also tried running the new 2048 notebook unmodified in Colab (on a T4) and encountered the same problem. I think this is no longer limited to Blackwell - nobody can run ART? @bradhilton

Answer 9 · 2025-08-26T20:14:15.000Z

Let me try reproducing.

Answer 10 · 2025-08-26T20:49:44.000Z

K, I was able to reproduce on a T4. Thank you @Aranxtonel.

Answer 11 · 2025-08-28T02:11:55.000Z

Appears to be a OOM error, but it's failing silently.

Answer 12 · 2025-08-28T18:53:31.000Z

Appears to be a OOM error, but it's failing silently.

Interesting - but why? This model doesn't use that much VRAM, and reducing gpu_memory_utilization also doesn't seem to help. Could it be an excessive memory allocation or some memory leak?

Answer 13 · 2025-08-31T08:18:04.000Z

going through same issue.

Answer 14 · 2025-09-17T10:22:03.000Z

Im wondering is this related to this issue huggingface/trl#3933

Answer 15 · 2025-10-04T01:39:45.000Z

same issue with latest code.

Answer 16 · 2025-10-11T16:48:42.000Z

Well, after trying the new 0.5.0 version with the old notebook (which still supports local execution), I can say it somewhat works. Training no longer freezes, and I managed to finish local training after a couple of restarts and complete a few training steps on Colab.
Unfortunately, it only somewhat works because it randomly spits the following errors; restarting the notebook allows training to continue from the previous checkpoint.

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-11' coro=<LocalBackend._monitor_openai_server() done, defined at /usr/local/lib/python3.12/dist-packages/art/local/backend.py:287> exception=NotFoundError("Error code: 404 - {'detail': 'Not Found'}")>
Traceback (most recent call last):
  File "/usr/lib/python3.12/asyncio/tasks.py", line 314, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/art/local/backend.py", line 332, in _monitor_openai_server
    raise e
  File "/usr/local/lib/python3.12/dist-packages/art/local/backend.py", line 322, in _monitor_openai_server
    await openai_client.models.retrieve(
  File "/usr/local/lib/python3.12/dist-packages/openai/resources/models.py", line 182, in retrieve
    return await self._get(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1730, in get
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1584, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'detail': 'Not Found'}
"./.art/2048-multi-turn/models/agent-002/history.jsonl" not found

and

AssertionError                            Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/unsloth_zoo/vllm_utils.py in load_vllm(model_name, config, gpu_memory_utilization, max_seq_length, dtype, training, float8_kv_cache, random_state, enable_lora, max_lora_rank, max_loras, use_async, use_engine, disable_log_stats, enforce_eager, enable_prefix_caching, compilation_config, conservativeness, max_logprobs, use_bitsandbytes, unsloth_vllm_standby, is_vision_model, return_args)
   1660             if use_async:
-> 1661                 llm = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**engine_args))
   1662             elif use_engine:

21 frames
AssertionError: Sleep mode can only be used for one instance per process.

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/unsloth_zoo/vllm_utils.py in load_vllm(model_name, config, gpu_memory_utilization, max_seq_length, dtype, training, float8_kv_cache, random_state, enable_lora, max_lora_rank, max_loras, use_async, use_engine, disable_log_stats, enforce_eager, enable_prefix_caching, compilation_config, conservativeness, max_logprobs, use_bitsandbytes, unsloth_vllm_standby, is_vision_model, return_args)
   1688                 )
   1689             else:
-> 1690                 raise RuntimeError(error)
   1691         pass
   1692     pass

RuntimeError: Sleep mode can only be used for one instance per process.

I'm not quite sure whether I should open a new issue for this, since it appears to be a continuation of the mentioned training problem in the same environments.

Answer 17 · 2025-10-13T18:30:27.000Z

RuntimeError: Sleep mode can only be used for one instance per process.

In my experience this is usually raised due to insufficient GPU memory

Answer 18 · 2025-11-04T16:47:44.000Z

I still encounter this problem with the latest art version (0.5.1). Does anyone know a workaround for this?