Agent(s) fail to start within 30 seconds
JamesKEbert opened this issue · 1 comments
JamesKEbert commented
Unfortunately, when trying to execute the runsets for aries-vcx or acapy, the agents fail to start (while testing #833). The following is what I am attempting on a linux machine:
./manage runset ariesvcx-ariesvcx -b
This fails with:
Running runset: ariesvcx-ariesvcx, using agents: -d aries-vcx
Scope of tests: -t @RFC0036,@RFC0037,@RFC0160,@RFC0023,@revocation -t ~@T002-HIPE0011 -t ~@T014-HIPE001 -t ~@wip -t ~@RFC0183 -t ~@DIDExchangeConnection -t ~@RFC0211 -t ~@RFC0453 -t ~@RFC0025 -t ~@RFC0434 -t ~@QualifiedDIDs
Agents to build: aries-vcx
Backchannel Folder: aries-backchannels/aries-vcx
Building aries-vcx-agent-backchannel ...
[+] Building 0.8s (5/5) FINISHED docker:desktop-linux
=> [internal] load build definition from Dockerfile.aries-vcx 0.0s
=> => transferring dockerfile: 129B 0.0s
=> [internal] load metadata for ghcr.io/hyperledger/aries-vcx/aath-backchannel:0.64.0-gmaath-064-update-7597 0.4s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 269B 0.0s
=> CACHED [1/1] FROM ghcr.io/hyperledger/aries-vcx/aath-backchannel:0.64.0-gmaath-064-update-7597@sha256:2556044b4c09aae2d6a4ecd0034e23e2e970 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.0s
=> => writing image sha256:6ea7b74bb8d582b090cf2de5f44ff71ffabc625eb83e62bb9632a7e1ca9de1ba 0.0s
=> => naming to docker.io/library/aries-vcx-agent-backchannel 0.0s
Build multi-platform images faster with Docker Build Cloud: https://docs.docker.com/go/docker-build-cloud
Building aries-test-harness ...
[+] Building 5.7s (11/11) FINISHED docker:desktop-linux
=> [internal] load build definition from Dockerfile.harness 0.0s
=> => transferring dockerfile: 349B 0.0s
=> [internal] load metadata for docker.io/library/python:3.7-buster 5.5s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 148B 0.0s
=> [1/6] FROM docker.io/library/python:3.7-buster@sha256:2539f956bcccbac5e4a48ebdafbbbfbd26b4ab56e65b96076ae9cd1188b119b3 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 6.42kB 0.0s
=> CACHED [2/6] COPY ./requirements.txt /aries-test-harness/ 0.0s
=> CACHED [3/6] WORKDIR /aries-test-harness 0.0s
=> CACHED [4/6] RUN pip install -r requirements.txt 0.0s
=> CACHED [5/6] RUN echo "==> Install stuff not in the requirements..." && pip install --no-cache-dir aiohttp 0.0s
=> CACHED [6/6] COPY . /aries-test-harness 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:0870116332cb9c53943f71b017ceff1c350bd875da96d97b93421a8287a2fdb1 0.0s
=> => naming to docker.io/library/aries-test-harness 0.0s
Build multi-platform images faster with Docker Build Cloud: https://docs.docker.com/go/docker-build-cloud
Agents to be used:
Acme - aries-vcx
Bob - aries-vcx
Faber - aries-vcx
Mallory - aries-vcx
starting local von-network...
[+] Running 14/14
✔ Network von_von Created 0.1s
✔ Volume "von_node2-data" Created 0.0s
✔ Volume "von_node3-data" Created 0.0s
✔ Volume "von_node4-data" Created 0.0s
✔ Volume "von_client-data" Created 0.0s
✔ Volume "von_webserver-cli" Created 0.0s
✔ Volume "von_webserver-ledger" Created 0.0s
✔ Volume "von_nodes-data" Created 0.0s
✔ Volume "von_node1-data" Created 0.0s
✔ Container von-node3-1 Started 0.4s
✔ Container von-node4-1 Started 0.4s
✔ Container von-webserver-1 Started 0.4s
✔ Container von-node1-1 Started 0.4s
✔ Container von-node2-1 Started 0.4s
Want to see the scrolling container logs? Run "./manage logs"
waiting for ledger to start/respond............
starting local uniresolver...
Using: docker --log-level error compose
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3091 100 3091 0 0 967k 0 --:--:-- --:--:-- --:--:-- 1006k
[+] Running 2/2
✔ Container uni-resolver-web.local Started 0.1s
✔ Container driver-did-sov.local Started 0.2s
starting local indy-tails-server...
Using: docker --log-level error compose
[+] Running 3/3
✔ Network docker_tails-server Created 0.1s
✔ Container docker-ngrok-tails-server-1 Started 0.3s
✔ Container docker-tails-server-1 Started 0.3s
waiting for tails server to start.
waiting for uniresolver to start.....
Starting Acme Agent using aries-vcx-agent-backchannel ...
Starting Bob Agent using aries-vcx-agent-backchannel ...
Starting Faber Agent using aries-vcx-agent-backchannel ...
Starting Mallory Agent using aries-vcx-agent-backchannel ...
waiting for Acme agent to start..............................
The agent failed to start within 30 seconds.
waiting for Bob agent to start...............................
The agent failed to start within 30 seconds.
waiting for Faber agent to start..............................
The agent failed to start within 30 seconds.
waiting for Mallory agent to start..............................
The agent failed to start within 30 seconds.
<test results here>
<agent and services shutdown here>
The logs when inspecting acme_agent for Aries VCX:
thread 'main' panicked at aries/agents/aath-backchannel/src/setup.rs:38:14:
Failed to send message: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(192.168.65.9)), port: Some(9000), path: "/register", query: None, fragment: None }, source: hyper_util::client::legacy::Error(Connect, ConnectError("tcp connect error", Os { code: 110, kind: TimedOut, message: "Operation timed out" })) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The logs when inspecting acme_agent for ACA-Py:
Starting aca-py agent ...
Error loading genesis transactions:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1025, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1065, in create_connection
raise exceptions[0]
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1050, in create_connection
sock = await self._connect_sock(
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 961, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 500, in sock_connect
return await fut
File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 535, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
TimeoutError: [Errno 110] Connect call failed ('192.168.65.9', 9000)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/aries-backchannels/python/agent_backchannel.py", line 73, in default_genesis_txns
async with session.get(f"{ledger_url}/genesis") as resp:
File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 1197, in __aenter__
self._resp = await self._coro
File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 581, in _request
conn = await self._connector.connect(
File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 944, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1257, in _create_direct_connection
raise last_exc
File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1226, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1033, in _wrap_create_connection
raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 192.168.65.9:9000 ssl:default [Connect call failed ('192.168.65.9', 9000)]
Error retrieving ledger genesis transactions
This to me smells of docker networking issues (as the agents fail to start when reaching out to the local von network from the looks of it). Any ideas?
JamesKEbert commented
It was a docker networking issue. By adding DOCKERHOST=host.docker.internal
before the command everything worked as expected. Thanks for the suggestion @gmulhearn!