FusionAuth/fusionauth-containers

DNS resolution issues when run in kubernetes / rancher

fdlk opened this issue · 4 comments

fdlk commented

We're running the server in kubernetes on a rancher cluster using the helm chart and occasionally the server gets into a state where unpredictably the DNS resolution starts to fail.

In our event logs we see

Request to the [https://github.com/login/oauth/access_token] endpoint failed. Status code [-1]

Exception encountered.

java.net.UnknownHostException : Message: github.com

Our network manager pointed us at DNS resolution issues with the musl library in alpine images.
See for example: https://support.cloudbees.com/hc/en-us/articles/360040999471-UnknownHostException-caused-by-DNS-Resolution-issue-with-Alpine-Images

Would you consider switching from alpine to another base layer?

We'll have to review this issue. Do any of the work arounds mentioned in that KB article work for you?

fdlk commented

The chart got updated really quickly so as a workaround we will set dnsConfig there and see if that fixes things. The nasty part of the issue is that it took us quite a while to figure out what was going wrong. We initially blamed node configuration on our on-premise cluster, since seemingly random other containers with alpine base images were also affected.

fdlk commented

We deployed the chart and I can see the suggested workaround applied when I shell into the fusion auth container and do a cat /etc/resolv.conf:

nameserver 10.43.0.10
search fusion-auth.svc.cluster.local svc.cluster.local cluster.local kuber.local webservice.local
options ndots:1

However, we just got another UnknownHostException so the workaround is not doing its job.

I'm going to close this now - the fusionauth image uses ubuntu instead of alpine these days.