Timeout issue
Closed this issue · 1 comments
giladfrid009 commented
In art.vllm.server file, the function test_client() often reaches timeout limit of the allowed 10 seconds during model initialization. Unfortunately this happens very often, and by manually increasing this limit to higher values (I set it to 5 minutes) I practically resolve all the timeout exceptions occuring when loading the model via calling backend.register()
It would be very helpful if that timeout could be controlled through a parameter (or ENV variable), or even if you would increase this value manually to some other higher constant.
Some basic sysinfo:
Python 3.12.11
openpipe-art 0.4.4
GPU: 3x NVIDIA A40
CPUs: 96x Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz