Hardcoded ConfigMap name: amazon-vpc-cni
pawelpodbielski opened this issue ยท 19 comments
What happened:
In chart ConfigMap name is hardcoded to amazon-vpc-cni
: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/charts/aws-vpc-cni/templates/configmap.yaml#L15
This causes problems if two helmreleases of aws-vpc-cni
are deployed in the same namespace with different names.
Source: https://github.com/aws/amazon-vpc-cni-k8s/pull/2517/files#diff-667ab2d37d1d54a8c53c92c0ed8544837ff52bc8cb4620ee1bc3d42135bf2f8e in V1.14
What you expected to happen:
ConfigMap name should be configurable, e.g: amazon-{{ include "aws-vpc-cni.fullname" . }}
How to reproduce it (as minimally and precisely as possible):
deploy 2 helmreleases of aws-vpc-cni
in the same namespace with different names
Environment:
- CNI Version 1.14
@ppodbielski I am confused, why would there be two helm releases of aws-vpc-cni
in the same namespace with different names? The error you are talking about sounds like it would happen with any kubernetes resource
@jdn5126
In my configuration, I have 6 subnets and 3AZ. Due to known limitations:
AWS CNI does not support adding a second subnet for the same AZ. ENIConfig only allows setting up 1 subnet and here is the upstream issue to add additional subnets but stale for a long time ENI Config Label/Annotation Def Concatenation aws/amazon-vpc-cni-k8s#1207
To workaround this problem I deployed 2 vpc-cni
HR in the same (system-specific) namespace. It has worked for me till now
If I want to use fullname amazon-vpc-cni
, the other configmap will have the same name as the one added in 1.14, which will lead to a conflict.
helm template amazon-vpc-cni eks/aws-vpc-cni --set cniConfig.enabled=true --set fullnameOverride=amazon-vpc-cni | grep ConfigMap -A 2
kind: ConfigMap
metadata:
name: amazon-vpc-cni
--
kind: ConfigMap
metadata:
name: amazon-vpc-cni
IMO this is just bad practice to hardcode resource name
@ppodbielski I am not sure how two VPC CNI installations could work, but for the ENIConfig enhancement, this is a task that we are currently working on. I am also curious why you are using the first ConfigMap to begin with. That ConfigMap is used for overriding the conflist, and it requires Values.cniConfig.enabled
to be set to even be rendered. If you want to continue with this double installation, and you need the first ConfigMap for some reason, can you set fullnameOverride
(https://github.com/aws/amazon-vpc-cni-k8s/blob/master/charts/aws-vpc-cni/templates/_helpers.tpl#L14)?
@jkotiuk a fixed name is required so that controllers know which ConfigMap they need to watch. For the first ConfigMap, which is used to override the conflist, that's why Values.fullnameOverride
is provided.
@jdn5126
My configuration (each release has its own set of nodes to launch):
release:
fullnameOverride: "aws-node"
eniConfig:
create: true
region: AWS_REGION
subnets:
pods-subnet-1a:
id: PODS_SUBNET_A
securityGroups:
- POD_SG
pods-subnet-1b:
id: PODS_SUBNET_B
securityGroups:
- POD_SG
pods-subnet-1c:
id: PODS_SUBNET_C
securityGroups:
- POD_SG
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "NodeA"
operator: In
values:
- "true"
release1:
fullnameOverride: "aws-node-1"
eniConfig:
create: true
region: AWS_REGION
subnets:
pods-subnet-1a:
id: PODS_SUBNET_1
securityGroups:
- POD_SG
pods-subnet-1b:
id: PODS_SUBNET_2
securityGroups:
- POD_SG
pods-subnet-1c:
id: PODS_SUBNET_3
securityGroups:
- POD_SG
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "NodeA"
operator: NotIn
values:
- "true"
I reported this for the Helm chart as well here: aws/eks-charts#1003, with the use case for multiple installs listed there, but to repeat it here:
We want to be able to have more control over how we roll out CNI upgrades on our clusters, since we've had problems with upgrades in the past causing breakage, so we have two parallel installs of the CNI, and use pod affinity rules so that the pods don't run on the same nodes. We have one set of nodes dedicated as "canary" nodes which run one install, and the other install runs on everything but the canary nodes. That way we can upgrade and change the configuration on the canary nodes independently of the non-canary nodes.
Thanks for the input @sarahhodne. I am a bit confused by this method of continuous integration. Rather than fixing which nodes the daemonsets will run on, can you deploy the new version in a staging cluster? That's the recommended way to handle upgrades, rather than all this complexity. As for the ConfigMap, the amazon-vpc-cni
ConfigMap is required by the Network Policy Controller, and its name must be fixed.
As a side note, the current RollingUpdate
strategy prevents more than 10% of aws-node
pods from being unavailable during an update, and the aws-node
pod being unavailable only affects new pods. Just mentioning for knowledge-sake.
amazon-vpc-cni
is a configMap resource that is referred to by multiple controllers running on EKS control plane (Network Policy controller, VPC resource controller) to check for some config/feature flags. We will not be able to change the name via some flag and even if we do we will run in to the same issue with the above use case as the controllers can only refer to one config Map.
While I understand the use case above, that is not an officially supported pattern by the charts we vend via eks-charts. Why can't we customize the second installation and just remove the amazon-vpc-cni
configMap from the second chart and let the second installation use the same CM?
I'm not a big fan of aws/eks-charts#1012, as we don't want to end up with a flag for each and every helm resource and field that this comes up for. Since this pattern is not one that we recommend or support, I suggest mirroring or forking the helm chart for your use case.
Another option is to render the manifest from the helm chart, then remove the ConfigMap, then apply it.
In my case, we were running a single instance of this chart, we had 3 subnets, and everything worked great.
At some point, we ran out of IP addresses, added secondary CIDR to VPC and three more subnets. As a single deployment does not support more than one subnet per AZ, we had to add another deployment and control nodeSelectors where each chart is deployed.
Now, we have to add helm post render to remove this hardcoded config from one of the charts.
What would be a proper solution to support 6 subnets in 3 AZ using one vpc-cni deployment?
@owengo I definitely understand the challenge of choosing MINIMUM and WARM targets in a cluster where some nodes run a small number of pods and other nodes run a large number of pods, as it is a tradeoff between provisioning excess IPs and being able to scale quickly. This is a challenge that we have been actively trying to figure out a good solution for, given Kubernetes Daemonsets do not allow for different values on different nodes.
Customers usually do one of three things:
- Set a small
WARM_IP_TARGET
and prioritize IP address conservation over rapid scaling. - Set a high
MINIMUM_IP_TARGET
for scale and accept the excess provisioning on nodes running few pods. - Choose small instances (or instances with a small number of ENIs: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/vpc/vpc_ip_resource_limit.go) for the nodes that run few pods. You can set
MINIMUM_IP_TARGET
to 200, and if the instance can only support 16 IPs, then it will just allocate 16 and stop.
Or, the long-term solution is that they move to IPv6 clusters and no longer have to worry about any of this.
For your issue, I don't think deploying multiple helm charts is a good solution. It is not how Daemonsets are designed to be deployed, and it not how this chart is meant to be deployed.
@jkotiuk I assume you are talking about using Custom Networking and wanting to support multiple subnets for an ENIConfig. If so, this is something we are working on and plan to have an update for soon. It is not a great workaround, but once you exhaust a subnet, you can update the ENIConfig with the new subnet (replacing the old one) and nodes will start to pull from that subnet.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
I am going to close this issue since we cannot change the name of the ConfigMap, but note that we are working diligently on improving the IP management experience. #2714 is a start to handling IP exhaustion without any additional config, and next is offering a config-less handling of the warm pool
โ ๏ธ COMMENT VISIBILITY WARNINGโ ๏ธ
Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
@achevuru,
I have to say that this is a big issue for us to ... we wanted to move to this chart, but cannot because we also have a situation where we run multiple aws-node
DaemonSets for different networking configurations. It seems pretty silly to have hard-coded resource names. Can you elaborate on why it is critical that the ConfigMap
is hard-coded with a name?
@diranged as explained above, the VPC Resource Controller has to know which ConfigMap to look at, so the name must be fixed. The real solution here is for us to eliminate all of the use cases for deploying multiple helm charts.
@diranged as explained above, the VPC Resource Controller has to know which ConfigMap to look at, so the name must be fixed. The real solution here is for us to eliminate all of the use cases for deploying multiple helm charts.
I don't think that's a reasonable path forward - as you will never predict each and every use case. I think that the right answer here is to pass the config map name into the controller, so that it knows which one to look at, rather than hard-coding anything.