Unable to install Maesh on AWS EKS v1.17 due to a CoreDNS issue
0rax opened this issue · 7 comments
Bug Report
What did you do?
Installed traefik-maesh
from Helm on a AWS EKS v1.17 (eks.3) cluster with Calico networing using
helm repo add traefik-mesh https://helm.traefik.io/mesh
helm repo update
helm install traefik-mesh traefik-mesh/traefik-mesh
What did you expect to see?
I was expecting the controller to start and maesh to be working.
What did you see instead?
The traefik-maesh-controller
pod went into CrashLoopBackOff
due to an issue with the traefik-maesh-prepare
container. The issue seems to be linked to the "CoreDNS" version not being compatible with maesh though it should be (CoreDNS 1.3+).
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned default/traefik-mesh-controller-5f48ff8f69-vrbd9 to xxx.compute.internal
Normal Pulled 11m (x5 over 13m) kubelet Container image "traefik/mesh:v1.4.0" already present on machine
Normal Created 11m (x5 over 13m) kubelet Created container traefik-mesh-prepare
Normal Started 11m (x5 over 13m) kubelet Started container traefik-mesh-prepare
Warning BackOff 2m51s (x49 over 13m) kubelet Back-off restarting failed container
Output of prepare container log: (traefik/mesh:v1.4.0
)
2020/10/28 19:16:35 command prepare error: unable to find suitable DNS provider: unsupported CoreDNS version "1.6.6-eksbuild.1"
What is your environment & configuration (arguments, provider, platform, ...)?
- Kubernetes version: v1.17.9-eks-a84824
- EKS version: v1.17-eks.3
- Calico version: v3.16.4
- Maesh version: v1.4.0
@0rax Thanks for your interest in Traefik Mesh!
It appears that the issue comes from one of our dependencies: https://github.com/hashicorp/go-version.
Before patching the DNS configuration we make sure CoreDNS is between >= 1.3
and < 1.8
. But go-version
constrains considers that a version with a pre-release never matches with a constrain specified without a pre-release.
An issue is already open on their repository to understand why it behave like this: hashicorp/go-version#59
Until this get sorted, we can replace the goversion.NewConstraint(">= 1.3, < 1.8")
by a version.GreaterThanOrEqual
and version.LessThan
. In this type of comparison pre-releases are handled correctly.
Thank you for your quick answer, seems like an issue that could be easily fixed.
I will try to build a custom version of the docker-image with this fix to properly check Maesh compatibility with my setup.
@0rax Could you base your changes on v1.4? Since it's a bug fix it would be great to have it on this version.
Don't hesitate to ping me if you need help on this.
It looks like that using this patch on top of refs/tags/v1.4.0
I was able to start traefik-mesh
successfully.
diff --git a/pkg/dns/dns.go b/pkg/dns/dns.go
index c62d46d..0416b87 100644
--- a/pkg/dns/dns.go
+++ b/pkg/dns/dns.go
@@ -39,7 +39,11 @@ const (
traefikMeshBlockTrailer = "#### End Traefik Mesh Block"
)
-var versionCoreDNS17 = goversion.Must(goversion.NewVersion("1.7"))
+var (
+ versionCoreDNS17 = goversion.Must(goversion.NewVersion("1.7"))
+ versionCoreDNS13 = goversion.Must(goversion.NewVersion("1.3"))
+ versionCoreDNS18 = goversion.Must(goversion.NewVersion("1.8"))
+)
// Client holds the client for interacting with the k8s DNS system.
type Client struct {
@@ -103,7 +107,7 @@ func (c *Client) coreDNSMatch(ctx context.Context) (bool, error) {
return false, err
}
- if !versionConstraint.Check(version) {
+ if !(version.GreaterThanOrEqual(versionCoreDNS13) && version.LessThan(versionCoreDNS18)) {
c.logger.Debugf("CoreDNS version is not supported, must satisfy %q, got %q", versionConstraint, version)
return false, fmt.Errorf("unsupported CoreDNS version %q", version)
Quick note, I just had to create a namespace myself as the current helm chart seems to install it in the default namespace by default, this seams inconsistent with the documentation available here https://doc.traefik.io/traefik-mesh/install/#verify-your-installation where it says to check the installation using the traefik-mesh
namespace.
For people interested about how I was able to deploy it after patching the code, I had to launch the following commands:
make
docker tag traefik/mesh:latest XXXXXXX.dkr.ecr.eu-west-3.amazonaws.com/traefik-mesh:v1.4.0-eks
docker push XXXXXXX.dkr.ecr.eu-west-3.amazonaws.com/traefik-mesh:v1.4.0-eks
echo "---
apiVersion: v1
kind: Namespace
metadata:
name: traefik-mesh" | kubectl apply -f -
helm install traefik-mesh traefik-mesh/traefik-mesh \
--set controller.image.pullPolicy=IfNotPresent \
--set controller.image.name=XXXXXXX.dkr.ecr.eu-west-3.amazonaws.com/traefik-mesh \
--set controller.image.tag=v1.4.0-eks \
--namespace=traefik-mesh
@0rax This patch sounds good 👍
Could you please open a Pull Request to contribute the changes upstream? We will make sure to release a patch version on the v1.4.
Thanks again for your time on this.