Requests hang when pulling from github.com
Hamxter opened this issue · 5 comments
I've encountered a unique problem using DinD that I haven't been able to find a solution. I'm running DinD inside of a microk8s cluster to execute devops pipelines. The problem is that the containers running inside DinD cannot pull ANY content from github.com (and only github.com as far as I can tell) and just hangs after resolving the DNS and connecting.
Here is a sample of a request from a container not working inside DinD. Note, this is not isolated to the tooling or the repository, I've tried cURL and Node to make the request and cannot even get a response from wget github.com
. I've also tried multiple different containers.
/ # docker run -it node:20 sh
# wget https://github.com/helmfile/helmfile/releases/download/v0.158.1/helmfile_0.158.1_linux_amd64.tar.gz
--2023-12-31 23:46:29-- https://github.com/helmfile/helmfile/releases/download/v0.158.1/helmfile_0.158.1_linux_amd64.tar.gz
Resolving github.com (github.com)... 20.248.137.48
Connecting to github.com (github.com)|20.248.137.48|:443... connected.
It just hangs after this point.
However, if I just run the request after exec'ing into DinD (not inside a container running in it) it works fine.
/ # wget https://github.com/helmfile/helmfile/releases/download/v0.158.1/helmfil
e_0.158.1_linux_amd64.tar.gz
Connecting to github.com (20.248.137.48:443)
Connecting to objects.githubusercontent.com (185.199.109.133:443)
saving to 'helmfile_0.158.1_linux_amd64.tar.gz'
helmfile_0.158.1_lin 100% |********************************| 20.3M 0:00:00 ETA
'helmfile_0.158.1_linux_amd64.tar.gz' saved
My DinD deployment is simple
image:
repository: docker
tag: 24-dind
pullPolicy: IfNotPresent
env:
DOCKER_TLS_CERTDIR: /certs
securityContext:
privileged: true
Here are my nodes
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rachel Ready <none> 415d v1.28.3 192.168.1.9 <none> Ubuntu 22.04.3 LTS 5.15.0-91-generic containerd://1.6.15
roy Ready <none> 415d v1.28.3 192.168.1.10 <none> Ubuntu 22.04.3 LTS 5.15.0-91-generic containerd://1.6.15
deckard Ready,SchedulingDisabled <none> 415d v1.28.3 192.168.1.8 <none> Ubuntu 22.04.3 LTS 5.15.0-91-generic containerd://1.6.15
I have tried multiple different versions of DinD and couldn't get it to work. I tried replicating this in docker on my desktop (docker -> dind -> node:20) and it worked fine. Not sure what else to do here so any help would be greatly appreciated. Thanks
This is the first time I have deployed DinD so I don't have any previous data. I believe I have found the reason for the issue but am unsure how to fix it. I used ksniff to record the packet data of the DinD container during a request.
I think the main thing to look at is the duplicate ClientHello requests, one coming from the node20 container within DinD (which is expected), but also one coming from the DinD container itself (Identified with the kubernetes container IP). I did this request for other sites and can confirm that ALL activity is duplicated (this ranges from dns lookups to application data packets, there is always 2).
There is another problem in that the protocol used in this ClientHello request is TLSv1. All request I did for other sites used TLSv1.3 coming from the same container.
The only conclusion that I can come to is that GitHub is ignoring the ClientHello. This is either when duplicate requests are performed within such a short period of time, or when using TLSv1.
I believe the issue that I'm primarily trying to tackle here is the duplicate requests. I'm unsure where to go from here to try and debug this problem, so any help would be greatly appreciated.
Finally, here is a request that is similar to github content request, but from a different site where it performs successfully. Note, all of the TCP dupes.
Now that #468 is merged and deployed, can you try again? (if it still doesn't work, try with DOCKER_IPTABLES_LEGACY=1
set 👀)
This did not fix the issue. It would be interesting if someone else could run ksniff on their DinD container to see if there are duplicated network calls like on my cluster. It might be a lower level issue (I'm running microk8s)
I figured out the problem. I use Project Calico as my CNI and it uses an MTU of 1440, DinD has a default MTU of 1500. This was discussed here projectcalico/calico#2334. If anyone comes across this in the future to fix this I just added args: ["--mtu=1440"] to the DinD deployment.