torchserve bloom7b1 demo Load model failed

Question

torchserve bloom7b1 demo Load model failed

zqc2011hy opened this issue 3 months ago · 2 comments

zqc2011hy commented 3 months ago

🐛 Describe the bug

2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2
2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died.
2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED

Error logs

2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2
2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died.
2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED

Installation instructions

https://kserve.github.io/website/latest/modelserving/v1beta1/llm/torchserve/accelerate/

Model Packaging

gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom

config.properties

No response

Versions

torchserve --start --model-store=/mnt/models/model-store --ts-config=/mnt/models/config/config.properties

Repro instructions

gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom

Possible Solution

No response

Answer 1 · 2024-06-22T19:30:14.000Z

Hi @zqc2011hy Looking at this log : Backend worker did not respond in given time, it seems you need to increase the default_response_timeout value in config.properties.
This value would changes depending on the hardware you are using.

Answer 2 · 2024-06-23T01:53:25.000Z

SERVICE_HOSTNAME=$(kubectl get inferenceservice bloom7b1 -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v
-H "Host: ${SERVICE_HOSTNAME}"
-H "Content-Type: application/json"
-d @./text.json
http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/bloom7b1:predict

{"predictions":["My dog is cute.\nNice.\n- Hey, Mom.\n- Yeah?\nWhat color's your dog?\n- It's gray.\n- Gray?\nYeah.\nIt looks gray to me.\n- Where'd you get it?\n- Well, Dad says it's kind of...\n- Gray?\n- Gray.\nYou got a gray dog?\n- It's gray.\n- Gray.\nIs your dog gray?\nAre you sure?\nNo.\nYou sure"]}

Please provide the specific parameters of text.json