Custom Networking ENIs fail to attach in v1.13.0 when no security groups are specified
jdn5126 opened this issue · 4 comments
What happened:
In v1.13.0
, #2354 was introduced, which improved custom networking startup time by attaching custom ENIs during node initialization. This introduced an issue, though, when no security group was associated with the ENIConfig object. In this case, we should fall back to the security group assigned to the node's primary ENI.
The security group assigned to the node's primary ENI comes from the EC2 metadata cache, though, and this value is not synced until after node initialization: https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.13.0/pkg/ipamd/ipamd.go#L450
Our retry strategy in v1.13.0
is to let the aws-node
pod crash so that the node does not become ready for pods to be scheduled. When the aws-node
pod restarts, we hit the same issue, as the cache never syncs the primary ENI's security group ID until after node initialization.
Attach logs
{"level":"info","ts":"2023-06-15T21:55:31.095Z","caller":"ipamd/ipamd.go:903","msg":"Found ENI Config Name: us-west-2b"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"ipamd/ipamd.go:873","msg":"ipamd: using custom network config: [], subnet-07f48a842d5d33cee"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"Using a custom network config for the new ENI"}
{"level":"warn","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"No custom networking security group found, will use the node's primary ENI's SG: []"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"Creating ENI with security groups: [] in subnet: subnet-07f48a842d5d33cee"}
{"level":"error","ts":"2023-06-15T21:55:31.299Z","caller":"awsutils/awsutils.go:763","msg":"Failed to CreateNetworkInterface InvalidParameterValue: user [REDACTED] does not own a resource\n\tstatus code: 400, request id: e12e797e-19f5-47fa-8d8e-4a5de5a677b7"}
What you expected to happen:
Custom ENIs should be attached during node initialization, and there should be a better retry strategy.
How to reproduce it (as minimally and precisely as possible):
- Deploy VPC CNI v1.13.0
- Create an ENIConfig with no security groups
- See error
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): does not matter - CNI Version: v1.13.0
- OS (e.g:
cat /etc/os-release
): Amazon Linux 2 - Kernel (e.g.
uname -a
): does not matter
The workaround for this issue in v1.13.0 is to specify any security group ID when defining an ENIConfig
. It is recommended that you choose a no-op security group, like the cluster security group or VPC default security group ID.
+1
Observed the same issue. The workaround helps to get the nodes to join the cluster.
Closing as this is fixed in v1.13.2 release
⚠️ COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.