OpenPipe/ART

Timeout issue

Closed this issue · 1 comments

In art.vllm.server file, the function test_client() often reaches timeout limit of the allowed 10 seconds during model initialization. Unfortunately this happens very often, and by manually increasing this limit to higher values (I set it to 5 minutes) I practically resolve all the timeout exceptions occuring when loading the model via calling backend.register()

It would be very helpful if that timeout could be controlled through a parameter (or ENV variable), or even if you would increase this value manually to some other higher constant.

Some basic sysinfo:

Python 3.12.11
openpipe-art 0.4.4
GPU: 3x NVIDIA A40
CPUs: 96x Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz

I've opened a small PR to fix this timeout issue: #292

The fix adds a configurable ART_SERVER_TIMEOUT environment variable that allows you to increase the timeout from the default 10
seconds. You can now set:

export ART_SERVER_TIMEOUT=XXX (seconds)