[Failed-Request-Alert-Tuning] Investigate and Mitigate Excessive failed-request-too-high Alerts in the Alerting System
Closed this issue · 0 comments
Problem
The current Alerting System on Grafana is generating an excessive number of failed-request-too-high
alerts. To ensure the system remains reliable and actionable, it is crucial to investigate the root cause of the elevated failure rates and assess whether the alerting thresholds or mechanisms need refinement.
Upon analysis, the eth_getBlockByHash
and eth_getBlockByNumber
endpoints have been identified as the primary drivers of the issue, contributing significantly to the recurring errors observed in the system.
Solution
The proposed solution involves analyzing the logs to determine the root cause of the failed requests and identifying the specific response codes being returned. Based on these findings, address the underlying issues by locating and resolving the bug causing the failures. This approach will help mitigate the problem and effectively reduce the volume of white-noise alerts, ensuring the alerting system remains focused on critical issues.
Alternatives
No response