fluxcd/notification-controller

Metrics has a potential cardinality issue

Kuzbekov opened this issue · 4 comments

By nature, our webhooks are usually open for external connections, especially when the code is on cloud-based repositories. If somebody tries to scan the hostname for webhook directed to the webhook receiver for vulnerabilities and attempts to gain access to multiple paths, it will generate a lot of metrics records and lead to cardinality explosion on the Prometheus server because the label "handle" stores the path from each new request due to dynamic handler naming.
It can be mitigated by making ingress with static path, but it's not mentioned in documentation.
Relevant information is here

We would need a new flag for the controller and changes in

h := std.Handler("", mdlw, mux)
and
h := std.Handler("", mdlw, mux)

Are you willing to contribute the change?

I would like to contribute. Do you think it would be better to implement a flag with a default value that ensures safety? This change would necessitate notifying users due to its potential impact. Currently, someone could theoretically use precise webhook values in the 'handle' label within alerts or metrics. Or we should keep current behavior?

we have to keep the default behavior. Will should add recommendation in the documentation.