aws/amazon-vpc-cni-k8s

2-3x increase in GetMetadata API calls from 1.16.2 -> 1.16.3 release

diranged opened this issue · 5 comments

What happened:
This morning we upgraded from 1.16.2 to 1.16.3 - and while there are no errors or problems, I noticed a sharp increase in the latency of GetMetadata API calls coming from the CNI pods (and the count):

(Graph is a rate ...)

sum by (api, error, status) (
    rate(
        awscni_aws_api_latency_ms_sum{
            job=~"$daemonset",
            node=~"$node",
            error="false"
        }[$__rate_interval]
    )
)

image

Here's the count of total calls:

sum by (api, error, status) (
    rate(
        awscni_aws_api_latency_ms_count{
            job=~"$daemonset",
            node=~"$node",
            error="false"
        }[$__rate_interval]
    )
)

Screenshot 2024-02-26 at 1 06 42 PM

Is this expected or known?

Environment:
Kubernetes: 1.28
CNI: 1.16.3
OS: BottleRocket 1.17.0

@diranged this is definitely not a known issue. Did the volume of these calls change at all? Does reverting to v1.16.2 immediately resolve this issue?

@diranged v1.16.4 is now released. Can you try this release?

@diranged - Do you see the same behavior with v1.16.4 release? This call volume could have come from transitive dependency call, and we wanted to verify that.

Later version v1.16.4 and now v1.17.1 is available, and we haven't seen any of reports of this. Closing and fixed.

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.