Networking problem: intermitent "No route to host" / "Failed to connect" errors
Closed this issue · 3 comments
Have you read the documentation?
- Yes, but it does not include related information regarding my question.
You are setting up gotify in
- Docker
Describe your problem
I'm running Gotify using Docker on a Debian host. It listens on port 8003 (port 80 in container is mapped to port 8003 on host).
gb@server $ sudo netstat -anp | grep 8003
tcp 0 0 0.0.0.0:8003 0.0.0.0:* LISTEN 3039255/docker-prox
gb@server $ docker inspect gotify | grep HostPort
"HostPort": "8003"
Sometimes, say 9 times out of 10, when I try to connect to Gotify from the server, I get a "No route to host" error. This happens if I use my LAN IP, or Docker's network IP:
gb@server $ curl http://192.168.155.88:8003/
curl: (7) Failed to connect to 192.168.155.88 port 8003: No route to host
gb@server $ curl http://172.18.0.1:8003/
curl: (7) Failed to connect to 172.18.0.1 port 8003: No route to host
But sometimes, for no apparent reason, it just works...
gb@server $ curl http://172.18.0.1:8003/
<!doctype html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" ...
If I use localhost
or 127.0.0.1
to connect, it also works only 1/10 times, but the error is Connection reset by peer
:
gb@server $ curl http://localhost:8003/
curl: (56) Recv failure: Connection reset by peer
gb@server $ curl http://127.0.0.1:8003/
curl: (56) Recv failure: Connection reset by peer
This problem doesn't happen with other Docker containers I'm running. I have about 60 in total, most of them listening on some port, and they all seem to work fine.
When I try to connect to Gotify from a remote host, using the LAN IP (or the VPN IP), it works 9 times out of 10 (i.e. much more often). And when it fails, the error is "Couldn't connect to server" after what seems like a random time between 2 and 20 seconds:
gb@workstation $ curl "http://192.168.155.88:8003/"
<!doctype html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" ...
gb@workstation $ curl "http://192.168.155.88:8003/"
curl: (7) Failed to connect to 192.168.155.88 port 8003 after 3115 ms: Couldn't connect to server
gb@workstation $ curl "http://192.168.155.88:8003/"
curl: (7) Failed to connect to 192.168.155.88 port 8003 after 16215 ms: Couldn't connect to server
And sometimes, when it works, instead of returning in less than 500ms, there will be a much longer delay to receive a response:
gb@workstation $ time bash -c 'curl -so /dev/null "http://192.168.155.88:8003/" ; echo $?'
0
real 0m0.141s
gb@workstation $ time bash -c 'curl -so /dev/null "http://192.168.155.88:8003/" ; echo $?'
0
real 0m0.634s
gb@workstation $ time bash -c 'curl -so /dev/null "http://192.168.155.88:8003/" ; echo $?'
0
real 0m0.348s
gb@workstation $ time bash -c 'curl -so /dev/null "http://192.168.155.88:8003/" ; echo $?'
0
real 0m0.240s
gb@workstation $ time bash -c 'curl -so /dev/null "http://192.168.155.88:8003/" ; echo $?'
0
real 0m0.442s
gb@workstation $ time bash -c 'curl -so /dev/null "http://192.168.155.88:8003/" ; echo $?'
0
real 0m19.558s # Worked, but took 20s
When an error occurs, trying to connect from the server or workstation, the Gotify logs show nothing. And when it takes 20s to return a response, the log don't show that; it always shows a 50-70µs response time:
2024-10-26T08:23:03-04:00 | 200 | 75.321µs | 192.168.155.44 | GET "/"
2024-10-26T08:23:04-04:00 | 200 | 63.134µs | 192.168.155.44 | GET "/"
2024-10-26T08:23:05-04:00 | 200 | 62.587µs | 192.168.155.44 | GET "/"
2024-10-26T08:23:06-04:00 | 200 | 57.22µs | 192.168.155.44 | GET "/"
2024-10-26T08:23:07-04:00 | 200 | 51.8µs | 192.168.155.44 | GET "/"
The only error I can see in Gotify logs is when a remote host can connect (eg. the mobile app on Android), it will end up disconnecting the websocket with an i/o timeout
error after a while:
2024-10-26T08:37:33-04:00 | 200 | 13.719278ms | 172.18.0.1 | GET "/stream?token=[masked]"
2024-10-26T08:37:36-04:00 | 200 | 974.083µs | 172.18.0.1 | GET "/message?limit=10"
WebSocket: ReadError read tcp 172.18.0.29:80->172.18.0.1:56152: i/o timeout
I have no firewall setup, ping always works, and route all looks fine:
$ sudo iptables -S INPUT
-P INPUT ACCEPT
$ sudo route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default modem 0.0.0.0 UG 0 0 0 enp6s0f1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-c99fa089877c
172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-f2c2237e5ed4
192.168.155.0 0.0.0.0 255.255.255.0 U 0 0 0 enp6s0f1
192.168.156.0 0.0.0.0 255.255.255.0 U 0 0 0 nebula1
$ ping 192.168.155.88
PING 192.168.155.88 (192.168.155.88) 56(84) bytes of data.
64 bytes from 192.168.155.88: icmp_seq=1 ttl=64 time=0.071 ms
64 bytes from 192.168.155.88: icmp_seq=2 ttl=64 time=0.036 ms
64 bytes from 192.168.155.88: icmp_seq=3 ttl=64 time=0.041 ms
64 bytes from 192.168.155.88: icmp_seq=4 ttl=64 time=0.035 ms
^C
--- 192.168.155.88 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3073ms
rtt min/avg/max/mdev = 0.035/0.045/0.071/0.014 ms
Any ideas on how I could debug this further?
There is most likely a problem in your network setup. Maybe there you will find some leads.
As discussed above unfortunately this feels like a networking issue and seems to be pretty specific. Posting more information may or may not help unfortunately (ip addr
ip rule
ip neigh
and all iptables chains). Have you ran any other server application on docker before?
For further debugging this may be helpful:
I installed tcpdump
to look further into what was happening, I stopped a cloudflared
(tunnel) container to stop spam in tcpdump
output, and changed net.core.wmem_max
and net.core.rmem_max
to 7500000
(a recommendation I found in the cloudflared
logs), and after all that, this problem was gone... ¯\_(ツ)_/¯