Unable to reach OpenAI-compatible server

Question

Unable to reach OpenAI-compatible server

Closed this issue 2 months ago · 10 comments

Jim2016713 commented 3 months ago

when i run 2048.ipynb:
TimeoutError Traceback (most recent call last)
Cell In[4], line 50
42 backend = LocalBackend(
43 # Normally we don't want to run the server in-process, but for the output
44 # to show up properly on Google Colab we'll enable this.
45 in_process=True,
46 path="./.art",
47 )
49 # Register the model with the local Backend (sets up logging, inference, and training)
---> 50 await model.register(backend)

File /opt/miniconda/envs/art/lib/python3.11/site-packages/art/model.py:307, in TrainableModel.register(self, backend, _openai_client_config)
301 async def register(
302 self,
303 backend: "Backend",
304 _openai_client_config: dev.OpenAIServerConfig | None = None,
305 ) -> None:
306 await super().register(backend)
--> 307 base_url, api_key = await backend._prepare_backend_for_training(
308 self, _openai_client_config
309 )
311 # Populate the top-level inference fields so that the rest of the
312 # code (and any user code) can create an OpenAI client immediately.
313 self.inference_base_url = base_url

File /opt/miniconda/envs/art/lib/python3.11/site-packages/art/local/backend.py:255, in LocalBackend._prepare_backend_for_training(self, model, config)
249 async def _prepare_backend_for_training(
250 self,
251 model: TrainableModel,
252 config: dev.OpenAIServerConfig | None = None,
253 ) -> tuple[str, str]:
254 service = await self._get_service(model)
--> 255 await service.start_openai_server(config=config)
256 server_args = (config or {}).get("server_args", {})
258 base_url = f"http://{server_args.get('host', '0.0.0.0')}:{server_args.get('port', 8000)}/v1"

File /opt/miniconda/envs/art/lib/python3.11/site-packages/art/torchtune/service.py:32, in TorchtuneService.start_openai_server(self, config)
31 async def start_openai_server(self, config: dev.OpenAIServerConfig | None) -> None:
---> 32 await openai_server_task(
33 engine=await self.llm,
34 config=dev.get_openai_server_config(
35 model_name=self.model_name,
36 base_model=self.get_last_checkpoint_dir() or self.base_model,
37 log_file=f"{self.output_dir}/logs/vllm.log",
38 config=config,
39 ),
40 )

File /opt/miniconda/envs/art/lib/python3.11/site-packages/art/vllm/server.py:81, in openai_server_task(engine, config)
75 done, _ = await asyncio.wait(
76 [openai_server_task, test_client_task],
77 timeout=timeout,
78 return_when="FIRST_COMPLETED",
79 )
80 if not done:
---> 81 raise TimeoutError(
82 f"Unable to reach OpenAI-compatible server within {timeout} seconds. You can increase this timeout by setting the ART_SERVER_TIMEOUT environment variable."
83 )
84 for task in done:
85 task.result()

TimeoutError: Unable to reach OpenAI-compatible server within 1000.0 seconds. You can increase this timeout by setting the ART_SERVER_TIMEOUT environment variable.

Answer 1 · 2025-08-11T18:37:34.000Z

Hi, @Jim2016713, in what environment are you running the notebook?
It looks like ART can’t launch the vLLM server.
Try searching for the vllm.log file:
find -type f -name vllm.log 2>/dev/null | head -n 5
Can you share it here? It should contain the specific error explaining why the server can’t start.

Answer 2 · 2025-08-13T11:46:55.000Z

Same issue, Everything log looks normal
max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=16, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='vllm', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=False, disable_cascade_attn=False, disable_log_requests=True, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False)
INFO 08-13 18:10:27 [serving_models.py:185] Loaded new LoRA adapter: name 'mcp-14b-alpha-001', path '/workspace/verl/ART/.art/mcp_alphavantage/models/mcp-14b-alpha-001/checkpoints/0000'
INFO 08-13 18:10:27 [serving_chat.py:80] "auto" tool choice has been enabled please note that while the parallel_tool_calls client option is preset for compatibility reasons, it will be ignored.
INFO 08-13 18:10:27 [api_server.py:1090] Starting vLLM API server on http://0.0.0.0:8000
INFO 08-13 18:10:27 [launcher.py:28] Available routes are:
INFO 08-13 18:10:27 [launcher.py:36] Route: /openapi.json, Methods: GET, HEAD
INFO 08-13 18:10:27 [launcher.py:36] Route: /docs, Methods: GET, HEAD
INFO 08-13 18:10:27 [launcher.py:36] Route: /docs/oauth2-redirect, Methods: GET, HEAD
INFO 08-13 18:10:27 [launcher.py:36] Route: /redoc, Methods: GET, HEAD
INFO 08-13 18:10:27 [launcher.py:36] Route: /health, Methods: GET
INFO 08-13 18:10:27 [launcher.py:36] Route: /load, Methods: GET
INFO 08-13 18:10:27 [launcher.py:36] Route: /ping, Methods: POST, GET
INFO 08-13 18:10:27 [launcher.py:36] Route: /tokenize, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /detokenize, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /v1/models, Methods: GET
INFO 08-13 18:10:27 [launcher.py:36] Route: /version, Methods: GET
INFO 08-13 18:10:27 [launcher.py:36] Route: /v1/chat/completions, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /v1/completions, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /v1/embeddings, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /pooling, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /score, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /v1/score, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /v1/audio/transcriptions, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /rerank, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /v1/rerank, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /v2/rerank, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /invocations, Methods: POST
INFO 08-13 18:10:27 [launcher.py:36] Route: /metrics, Methods: GET
INFO: Started server process [1073]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 127.0.0.1:52988 - "GET /metrics HTTP/1.1" 200
INFO: 127.0.0.1:41108 - "GET /metrics HTTP/1.1" 200
INFO: 127.0.0.1:40812 - "GET /metrics HTTP/1.1" 200
INFO 08-13 18:10:57 [launcher.py:79] Shutting down FastAPI HTTP server.
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.

Answer 3 · 2025-08-20T00:56:02.000Z

I just pushed something to address the OpenAI-compatible server hanging. Hopefully it will crash instead of getting stuck and you can add retry logic like the following if you like:

for _ in range(RETRIES)
  # register for every try
  await model.register(backend)
  try:
    # train loop, something like this
    for _ in range(await model.get_step(), 1_000):
      train_groups = await art.gather_trajectory_groups(
          (
              art.TrajectoryGroup(rollout(openai_client, prompt) for _ in range(32))
              for prompt in prompts
          ),
          pbar_desc="gather",
      )
      await model.train(
          train_groups
      )
except Exception:
  pass

Not sure if this will address the underlying issue, so would be interested to hear if it helps.

To get the latest version of openpipe-art:

uv add 'git+https://github.com/OpenPipe/ART.git#egg=openpipe-art[backend]'

Answer 4 · 2025-09-23T17:24:45.000Z

Same issue.

Have anyone fixed this?

Answer 5 · 2025-09-24T07:02:26.000Z

Increase the timeout to 120s, it's still struck.

INFO 09-24 14:56:26 [api_server.py:1755] vLLM API server version 0.10.0
INFO 09-24 14:56:26 [cli_args.py:261] non-default args: {'api_key': 'default', 'lora_modules': [LoRAModulePath(name='email-agent-001', path='/home/art-e/.art/email-search-agent/models/email-agent-001/checkpoints/0000', base_model_name=None)], 'return_tokens_as_token_ids': True, 'enable_auto_tool_choice': True, 'tool_call_parser': 'hermes', 'model': './models/qwen3-4B', 'served_model_name': ['./models/qwen3-4B'], 'generation_config': 'vllm', 'num_scheduler_steps': 16, 'disable_log_requests': True}
INFO 09-24 14:56:26 [serving_models.py:162] Loaded new LoRA adapter: name 'email-agent-001', path '/home/art-e/.art/email-search-agent/models/email-agent-001/checkpoints/0000'
INFO 09-24 14:56:26 [serving_chat.py:84] "auto" tool choice has been enabled please note that while the parallel_tool_calls client option is preset for compatibility reasons, it will be ignored.
INFO 09-24 14:56:26 [api_server.py:1818] Starting vLLM API server 0 on http://0.0.0.0:8000
INFO 09-24 14:56:26 [launcher.py:29] Available routes are:
INFO 09-24 14:56:26 [launcher.py:37] Route: /openapi.json, Methods: HEAD, GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /docs, Methods: HEAD, GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /redoc, Methods: HEAD, GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /health, Methods: GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /load, Methods: GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /ping, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /ping, Methods: GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /tokenize, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /detokenize, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/models, Methods: GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /version, Methods: GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/responses, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/responses/{response_id}, Methods: GET
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/responses/{response_id}/cancel, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/chat/completions, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/completions, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/embeddings, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /pooling, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /classify, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /score, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/score, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/audio/transcriptions, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/audio/translations, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /rerank, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v1/rerank, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /v2/rerank, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /scale_elastic_ep, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /is_scaling_elastic_ep, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /invocations, Methods: POST
INFO 09-24 14:56:26 [launcher.py:37] Route: /metrics, Methods: GET
INFO:     Started server process [27022]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     127.0.0.1:58128 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:38244 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:42832 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:40584 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:51826 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:44798 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:38830 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:47326 - "GET /metrics HTTP/1.1" 200
INFO:     127.0.0.1:59726 - "GET /metrics HTTP/1.1" 200
INFO 09-24 14:58:26 [launcher.py:80] Shutting down FastAPI HTTP server.
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.

Answer 6 · 2025-09-24T07:06:00.000Z

I just pushed something to address the OpenAI-compatible server hanging. Hopefully it will crash instead of getting stuck and you can add retry logic like the following if you like:
for _ in range(RETRIES)
  # register for every try
  await model.register(backend)
  try:
    # train loop, something like this
    for _ in range(await model.get_step(), 1_000):
      train_groups = await art.gather_trajectory_groups(
          (
              art.TrajectoryGroup(rollout(openai_client, prompt) for _ in range(32))
              for prompt in prompts
          ),
          pbar_desc="gather",
      )
      await model.train(
          train_groups
      )
except Exception:
  pass
Not sure if this will address the underlying issue, so would be interested to hear if it helps.

To get the latest version of openpipe-art:
uv add 'git+https://github.com/OpenPipe/ART.git#egg=openpipe-art[backend]'

I've tried your latest patch version, not works, still struck.

Answer 7 · 2025-09-24T09:12:06.000Z

Guys, you forget to exit the loop... Have you ever tested it?
Fix PR: #418

Answer 8 · 2025-09-24T13:34:17.000Z

great work @dcalsky!

Answer 9 · 2025-09-24T13:53:34.000Z

@bradhilton should I simply reinstall art using pip?

Answer 10 · 2025-09-24T15:26:01.000Z

you can try directly using the latest git version with pip or uv. we probably won't do a new ART release till next week or so.