Process is blocked in drawn_humanoid_pose_estimator

Question

Process is blocked in drawn_humanoid_pose_estimator

Closed this issue 8 days ago · 1 comments

I launch python image_to_animation.py drawings/garlic.png garlic_out
The script remains stuck on the post resp = requests.post("http://localhost:8080/predictions/drawn_humanoid_pose_estimator", files=data_file, verify=False)

I have a docker with 20 G of RAM. I don't think it's a RAM problem as it only uses 13G of RAM.

Here is the last lines of the docker log :

2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG - Backend worker process died.
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/site-packages/ts/model_service_worker.py", line 263, in <module>
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     worker.run_server()
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/site-packages/ts/model_service_worker.py", line 231, in run_server
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/site-packages/ts/model_service_worker.py", line 194, in handle_connection
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -                             ^^^^^^^^^^^^^^^^^^^^
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/site-packages/ts/model_service_worker.py", line 131, in load_model
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     service = model_loader.load(
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -               ^^^^^^^^^^^^^^^^^^
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/site-packages/ts/model_loader.py", line 108, in load
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-30T10:18:45,112 [INFO ] epollEventLoopGroup-5-32 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-30T10:18:45,112 [DEBUG] W-9000-drawn_humanoid_pose_estimator_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/importlib/__init__.py", line 90, in import_module
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
2024-04-30T10:18:45,112 [DEBUG] W-9000-drawn_humanoid_pose_estimator_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:120sec
java.lang.InterruptedException: null
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?]
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
2024-04-30T10:18:45,112 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 995, in exec_module
2024-04-30T10:18:45,113 [WARN ] W-9000-drawn_humanoid_pose_estimator_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: drawn_humanoid_pose_estimator, error: Worker died.
2024-04-30T10:18:45,113 [DEBUG] W-9000-drawn_humanoid_pose_estimator_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-drawn_humanoid_pose_estimator_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-04-30T10:18:45,113 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
2024-04-30T10:18:45,113 [WARN ] W-9000-drawn_humanoid_pose_estimator_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-04-30T10:18:45,113 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/tmp/models/616337bc7e8e4f0396a01d96d6b2a8ed/mmpose_handler.py", line 8, in <module>
2024-04-30T10:18:45,113 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     from mmpose.apis import (inference_bottom_up_pose_model,
2024-04-30T10:18:45,113 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.12/site-packages/mmpose/__init__.py", line 24, in <module>
2024-04-30T10:18:45,113 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG -     assert (mmcv_version >= digit_version(mmcv_minimum_version)
2024-04-30T10:18:45,113 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout MODEL_LOG - AssertionError: MMCV==1.7.2 is used but incompatible. Please install mmcv>=1.3.8, <=1.7.0.
2024-04-30T10:18:45,113 [INFO ] W-9000-drawn_humanoid_pose_estimator_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-drawn_humanoid_pose_estimator_1.0-stdout
2024-04-30T10:18:45,113 [WARN ] W-9000-drawn_humanoid_pose_estimator_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-drawn_humanoid_pose_estimator_1.0-stderr
2024-04-30T10:18:45,113 [WARN ] W-9000-drawn_humanoid_pose_estimator_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-drawn_humanoid_pose_estimator_1.0-stdout
2024-04-30T10:18:45,134 [WARN ] W-9015-drawn_humanoid_pose_estimator_1.0-stderr MODEL_LOG - /opt/conda/lib/python3.12/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.```


It seems to be Java errors. How to fix it ?

Answer 1 · 2024-05-20T05:01:50.000Z

You can try to limit CPU and Memory: docker run --cpus 4 -m 8g