aws/amazon-vpc-cni-k8s

Hardcoded ConfigMap name: amazon-vpc-cni

pawelpodbielski opened this issue ยท 19 comments

What happened:
In chart ConfigMap name is hardcoded to amazon-vpc-cni: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/charts/aws-vpc-cni/templates/configmap.yaml#L15

This causes problems if two helmreleases of aws-vpc-cni are deployed in the same namespace with different names.

Source: https://github.com/aws/amazon-vpc-cni-k8s/pull/2517/files#diff-667ab2d37d1d54a8c53c92c0ed8544837ff52bc8cb4620ee1bc3d42135bf2f8e in V1.14

What you expected to happen:
ConfigMap name should be configurable, e.g: amazon-{{ include "aws-vpc-cni.fullname" . }}

How to reproduce it (as minimally and precisely as possible):
deploy 2 helmreleases of aws-vpc-cni in the same namespace with different names

Environment:

  • CNI Version 1.14

@ppodbielski I am confused, why would there be two helm releases of aws-vpc-cni in the same namespace with different names? The error you are talking about sounds like it would happen with any kubernetes resource

@jdn5126
In my configuration, I have 6 subnets and 3AZ. Due to known limitations:

AWS CNI does not support adding a second subnet for the same AZ. ENIConfig only allows setting up 1 subnet and here is the upstream issue to add additional subnets but stale for a long time ENI Config Label/Annotation Def Concatenation aws/amazon-vpc-cni-k8s#1207

To workaround this problem I deployed 2 vpc-cni HR in the same (system-specific) namespace. It has worked for me till now

If I want to use fullname amazon-vpc-cni, the other configmap will have the same name as the one added in 1.14, which will lead to a conflict.

helm template amazon-vpc-cni eks/aws-vpc-cni --set cniConfig.enabled=true --set fullnameOverride=amazon-vpc-cni | grep ConfigMap -A 2
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
--
kind: ConfigMap
metadata:
  name: amazon-vpc-cni

IMO this is just bad practice to hardcode resource name

@ppodbielski I am not sure how two VPC CNI installations could work, but for the ENIConfig enhancement, this is a task that we are currently working on. I am also curious why you are using the first ConfigMap to begin with. That ConfigMap is used for overriding the conflist, and it requires Values.cniConfig.enabled to be set to even be rendered. If you want to continue with this double installation, and you need the first ConfigMap for some reason, can you set fullnameOverride (https://github.com/aws/amazon-vpc-cni-k8s/blob/master/charts/aws-vpc-cni/templates/_helpers.tpl#L14)?

@jkotiuk a fixed name is required so that controllers know which ConfigMap they need to watch. For the first ConfigMap, which is used to override the conflist, that's why Values.fullnameOverride is provided.

@jdn5126
My configuration (each release has its own set of nodes to launch):

release:

  fullnameOverride: "aws-node"
  eniConfig:
    create: true
    region: AWS_REGION
    subnets:
      pods-subnet-1a:
        id: PODS_SUBNET_A
        securityGroups:
          - POD_SG
      pods-subnet-1b:
        id: PODS_SUBNET_B
        securityGroups:
          - POD_SG
      pods-subnet-1c:
        id: PODS_SUBNET_C
        securityGroups:
          - POD_SG
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: "NodeA"
                  operator: In
                  values:
                    - "true"
                    

release1:

  fullnameOverride: "aws-node-1"
  eniConfig:
    create: true
    region: AWS_REGION
    subnets:
      pods-subnet-1a:
        id: PODS_SUBNET_1
        securityGroups:
          - POD_SG
      pods-subnet-1b:
        id: PODS_SUBNET_2
        securityGroups:
          - POD_SG
      pods-subnet-1c:
        id: PODS_SUBNET_3
        securityGroups:
          - POD_SG
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: "NodeA"
                  operator: NotIn
                  values:
                    - "true"

I reported this for the Helm chart as well here: aws/eks-charts#1003, with the use case for multiple installs listed there, but to repeat it here:

We want to be able to have more control over how we roll out CNI upgrades on our clusters, since we've had problems with upgrades in the past causing breakage, so we have two parallel installs of the CNI, and use pod affinity rules so that the pods don't run on the same nodes. We have one set of nodes dedicated as "canary" nodes which run one install, and the other install runs on everything but the canary nodes. That way we can upgrade and change the configuration on the canary nodes independently of the non-canary nodes.

Thanks for the input @sarahhodne. I am a bit confused by this method of continuous integration. Rather than fixing which nodes the daemonsets will run on, can you deploy the new version in a staging cluster? That's the recommended way to handle upgrades, rather than all this complexity. As for the ConfigMap, the amazon-vpc-cni ConfigMap is required by the Network Policy Controller, and its name must be fixed.

As a side note, the current RollingUpdate strategy prevents more than 10% of aws-node pods from being unavailable during an update, and the aws-node pod being unavailable only affects new pods. Just mentioning for knowledge-sake.

amazon-vpc-cni is a configMap resource that is referred to by multiple controllers running on EKS control plane (Network Policy controller, VPC resource controller) to check for some config/feature flags. We will not be able to change the name via some flag and even if we do we will run in to the same issue with the above use case as the controllers can only refer to one config Map.

While I understand the use case above, that is not an officially supported pattern by the charts we vend via eks-charts. Why can't we customize the second installation and just remove the amazon-vpc-cni configMap from the second chart and let the second installation use the same CM?

I'm not a big fan of aws/eks-charts#1012, as we don't want to end up with a flag for each and every helm resource and field that this comes up for. Since this pattern is not one that we recommend or support, I suggest mirroring or forking the helm chart for your use case.

Another option is to render the manifest from the helm chart, then remove the ConfigMap, then apply it.

In my case, we were running a single instance of this chart, we had 3 subnets, and everything worked great.

At some point, we ran out of IP addresses, added secondary CIDR to VPC and three more subnets. As a single deployment does not support more than one subnet per AZ, we had to add another deployment and control nodeSelectors where each chart is deployed.

Now, we have to add helm post render to remove this hardcoded config from one of the charts.

What would be a proper solution to support 6 subnets in 3 AZ using one vpc-cni deployment?

@owengo I definitely understand the challenge of choosing MINIMUM and WARM targets in a cluster where some nodes run a small number of pods and other nodes run a large number of pods, as it is a tradeoff between provisioning excess IPs and being able to scale quickly. This is a challenge that we have been actively trying to figure out a good solution for, given Kubernetes Daemonsets do not allow for different values on different nodes.

Customers usually do one of three things:

  1. Set a small WARM_IP_TARGET and prioritize IP address conservation over rapid scaling.
  2. Set a high MINIMUM_IP_TARGET for scale and accept the excess provisioning on nodes running few pods.
  3. Choose small instances (or instances with a small number of ENIs: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/vpc/vpc_ip_resource_limit.go) for the nodes that run few pods. You can set MINIMUM_IP_TARGET to 200, and if the instance can only support 16 IPs, then it will just allocate 16 and stop.

Or, the long-term solution is that they move to IPv6 clusters and no longer have to worry about any of this.

For your issue, I don't think deploying multiple helm charts is a good solution. It is not how Daemonsets are designed to be deployed, and it not how this chart is meant to be deployed.

@jkotiuk I assume you are talking about using Custom Networking and wanting to support multiple subnets for an ENIConfig. If so, this is something we are working on and plan to have an update for soon. It is not a great workaround, but once you exhaust a subnet, you can update the ENIConfig with the new subnet (replacing the old one) and nodes will start to pull from that subnet.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

I am going to close this issue since we cannot change the name of the ConfigMap, but note that we are working diligently on improving the IP management experience. #2714 is a start to handling IP exhaustion without any additional config, and next is offering a config-less handling of the warm pool

โš ๏ธCOMMENT VISIBILITY WARNINGโš ๏ธ

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@achevuru,
I have to say that this is a big issue for us to ... we wanted to move to this chart, but cannot because we also have a situation where we run multiple aws-node DaemonSets for different networking configurations. It seems pretty silly to have hard-coded resource names. Can you elaborate on why it is critical that the ConfigMap is hard-coded with a name?

@diranged as explained above, the VPC Resource Controller has to know which ConfigMap to look at, so the name must be fixed. The real solution here is for us to eliminate all of the use cases for deploying multiple helm charts.

@diranged as explained above, the VPC Resource Controller has to know which ConfigMap to look at, so the name must be fixed. The real solution here is for us to eliminate all of the use cases for deploying multiple helm charts.

I don't think that's a reasonable path forward - as you will never predict each and every use case. I think that the right answer here is to pass the config map name into the controller, so that it knows which one to look at, rather than hard-coding anything.

@diranged can you explain how that solves the problem?