2-3x increase in GetMetadata API calls from 1.16.2 -> 1.16.3 release

Question

2-3x increase in GetMetadata API calls from 1.16.2 -> 1.16.3 release

diranged opened this issue 4 months ago · 5 comments

What happened:
This morning we upgraded from 1.16.2 to 1.16.3 - and while there are no errors or problems, I noticed a sharp increase in the latency of GetMetadata API calls coming from the CNI pods (and the count):

(Graph is a rate ...)

sum by (api, error, status) (
    rate(
        awscni_aws_api_latency_ms_sum{
            job=~"$daemonset",
            node=~"$node",
            error="false"
        }[$__rate_interval]
    )
)

Here's the count of total calls:

sum by (api, error, status) (
    rate(
        awscni_aws_api_latency_ms_count{
            job=~"$daemonset",
            node=~"$node",
            error="false"
        }[$__rate_interval]
    )
)

Is this expected or known?

Environment:
Kubernetes: 1.28
CNI: 1.16.3
OS: BottleRocket 1.17.0

Answer 1 · 2024-02-26T21:09:58.000Z

@diranged this is definitely not a known issue. Did the volume of these calls change at all? Does reverting to v1.16.2 immediately resolve this issue?

Answer 2 · 2024-03-05T21:07:04.000Z

@diranged v1.16.4 is now released. Can you try this release?

Answer 3 · 2024-03-14T22:12:48.000Z

@diranged - Do you see the same behavior with v1.16.4 release? This call volume could have come from transitive dependency call, and we wanted to verify that.

Answer 4 · 2024-03-19T15:40:41.000Z

Later version v1.16.4 and now v1.17.1 is available, and we haven't seen any of reports of this. Closing and fixed.

Answer 5 · 2024-03-19T15:41:00.000Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.