bentoml/Yatai

docker crash on readyz call

Roalkege opened this issue · 0 comments

Hello I want to try and test Yatai with bentoml.
Idk if this is bentoml related or Yatai...

I containerize my bentofile and want to test the api.
After launching the localhost:3000 everything works fine...
But after calling /readyz my docker desktop crashed.

I have also the same problem on my Yatai instance after deploying a service.

Yatai Log `[2023-06-14 13:40:01] [Pod] [governance-b5dcf944c-8rqdz] [Created] Created container main [2023-06-14 13:40:01] [Pod] [governance-b5dcf944c-8rqdz] [Started] Started container main [2023-06-14 13:40:06] [Pod] [governance-b5dcf944c-8rqdz] [Unhealthy] Liveness probe errored: rpc error: code = Unknown desc = container not running (b50e5f47871d15a73d1a10f593ffa07c42336a90aaf6406221c069d06a323250) [2023-06-14 13:40:06] [Pod] [governance-b5dcf944c-8rqdz] [Unhealthy] Readiness probe errored: rpc error: code = Unknown desc = container not running (b50e5f47871d15a73d1a10f593ffa07c42336a90aaf6406221c069d06a323250) [2023-06-14 13:40:17] [HorizontalPodAutoscaler] [governance] [FailedGetResourceMetric] failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API [2023-06-14 13:40:17] [HorizontalPodAutoscaler] [governance] [FailedComputeMetricsReplicas] invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API [2023-06-14 13:40:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Pulled] Container image "127.0.0.1:5000/yatai-bentos:yatai.governance_classifier.hcgdqdakukp2yaav" already present on machine [2023-06-14 13:40:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Created] Created container main [2023-06-14 13:40:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Started] Started container main [2023-06-14 13:40:26] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Unhealthy] Readiness probe failed: Get "http://10.244.0.37:3000/readyz": dial tcp 10.244.0.37:3000: connect: connection refused [2023-06-14 13:40:27] [Pod] [governance-runner-0-85465c6b86-bnddg] [Pulled] Container image "127.0.0.1:5000/yatai-bentos:yatai.governance_classifier.hcgdqdakukp2yaav" already present on machine [2023-06-14 13:40:27] [Pod] [governance-runner-0-85465c6b86-bnddg] [Created] Created container main [2023-06-14 13:40:27] [Pod] [governance-runner-0-85465c6b86-bnddg] [Started] Started container main [2023-06-14 13:42:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [BackOff] Back-off restarting failed container main in pod governance-runner-0-85465c6b86-h2r8t_yatai(9f6cb8f8-e592-4fbb-aea3-89e969bdfc72) [2023-06-14 13:42:31] [Pod] [governance-b5dcf944c-8rqdz] [BackOff] Back-off restarting failed container main in pod governance-b5dcf944c-8rqdz_yatai(89e948b9-effb-4f82-82e4-4bcd426a6b88) [2023-06-14 13:42:32] [HorizontalPodAutoscaler] [governance] [FailedGetResourceMetric] failed to get cpu utilization: did not receive metrics for any ready pods [2023-06-14 13:42:41] [Pod] [governance-runner-0-85465c6b86-bnddg] [BackOff] Back-off restarting failed container main in pod governance-runner-0-85465c6b86-bnddg_yatai(196d4f8f-88a4-4241-a19e-c6836289e3de) [2023-06-14 13:45:55] [BentoDeployment] [governance] [GetDeployment] Getting Deployment yatai/governance-runner-0`
Docker Log ```2023-06-14T11:53:19+0000 [ERROR] [api_server:10] Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__ return await self.app(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 86, in __call__ raise exc from None File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 82, in __call__ await self.app(scope, inner_receive, inner_send) File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__ raise exc File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__ await self.app(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http/instruments.py", line 176, in __call__ await self.app(scope, receive, wrapped_send) File "/usr/local/lib/python3.9/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 579, in __call__ await self.app(scope, otel_receive, otel_send) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__ await self.app(scope, receive, wrapped_send) File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app raise exc File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 727, in __call__ await route.handle(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 285, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 74, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app raise exc File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 69, in app response = await func(request) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http_app.py", line 286, in readyz runners_ready = all(await asyncio.gather(*runner_statuses)) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 156, in runner_handle_is_ready return await self._runner_handle.is_ready(timeout) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 304, in is_ready async with self._client.get( File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 1141, in __aenter__ self._resp = await self._coro File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request await resp.start(conn) File "/usr/local/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 914, in start self._continue = None File "/usr/local/lib/python3.9/site-packages/aiohttp/helpers.py", line 721, in __exit__ raise asyncio.TimeoutError from None asyncio.exceptions.TimeoutError 2023-06-14T11:53:20+0000 [WARNING] [runner:governance:1] No training configuration found in save file, so the model was *not* compiled. Compile it manually.```