torchserve bloom7b1 demo Load model failed
zqc2011hy opened this issue ยท 2 comments
๐ Describe the bug
2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2
2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died.
2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED
Error logs
2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2
2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died.
2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED
Installation instructions
https://kserve.github.io/website/latest/modelserving/v1beta1/llm/torchserve/accelerate/
Model Packaging
gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom
config.properties
No response
Versions
torchserve --start --model-store=/mnt/models/model-store --ts-config=/mnt/models/config/config.properties
Repro instructions
gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom
Possible Solution
No response
Hi @zqc2011hy Looking at this log : Backend worker did not respond in given time
, it seems you need to increase the default_response_timeout value in config.properties.
This value would changes depending on the hardware you are using.
SERVICE_HOSTNAME=$(kubectl get inferenceservice bloom7b1 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v
-H "Host: ${SERVICE_HOSTNAME}"
-H "Content-Type: application/json"
-d @./text.json
http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/bloom7b1:predict
{"predictions":["My dog is cute.\nNice.\n- Hey, Mom.\n- Yeah?\nWhat color's your dog?\n- It's gray.\n- Gray?\nYeah.\nIt looks gray to me.\n- Where'd you get it?\n- Well, Dad says it's kind of...\n- Gray?\n- Gray.\nYou got a gray dog?\n- It's gray.\n- Gray.\nIs your dog gray?\nAre you sure?\nNo.\nYou sure"]}
Please provide the specific parameters of text.json