DNS resolution issues when run in kubernetes / rancher
fdlk opened this issue · 4 comments
We're running the server in kubernetes on a rancher cluster using the helm chart and occasionally the server gets into a state where unpredictably the DNS resolution starts to fail.
In our event logs we see
Request to the [https://github.com/login/oauth/access_token] endpoint failed. Status code [-1]
Exception encountered.
java.net.UnknownHostException : Message: github.com
Our network manager pointed us at DNS resolution issues with the musl library in alpine images.
See for example: https://support.cloudbees.com/hc/en-us/articles/360040999471-UnknownHostException-caused-by-DNS-Resolution-issue-with-Alpine-Images
Would you consider switching from alpine to another base layer?
We'll have to review this issue. Do any of the work arounds mentioned in that KB article work for you?
The chart got updated really quickly so as a workaround we will set dnsConfig there and see if that fixes things. The nasty part of the issue is that it took us quite a while to figure out what was going wrong. We initially blamed node configuration on our on-premise cluster, since seemingly random other containers with alpine base images were also affected.
We deployed the chart and I can see the suggested workaround applied when I shell into the fusion auth container and do a cat /etc/resolv.conf
:
nameserver 10.43.0.10
search fusion-auth.svc.cluster.local svc.cluster.local cluster.local kuber.local webservice.local
options ndots:1
However, we just got another UnknownHostException so the workaround is not doing its job.
I'm going to close this now - the fusionauth image uses ubuntu instead of alpine these days.