hashgraph/hedera-json-rpc-relay

[Failed-Request-Alert-Tuning] Investigate and Mitigate Excessive failed-request-too-high Alerts in the Alerting System

Closed this issue · 0 comments

Problem

The current Alerting System on Grafana is generating an excessive number of failed-request-too-high alerts. To ensure the system remains reliable and actionable, it is crucial to investigate the root cause of the elevated failure rates and assess whether the alerting thresholds or mechanisms need refinement.

Upon analysis, the eth_getBlockByHash and eth_getBlockByNumber endpoints have been identified as the primary drivers of the issue, contributing significantly to the recurring errors observed in the system.

Solution

The proposed solution involves analyzing the logs to determine the root cause of the failed requests and identifying the specific response codes being returned. Based on these findings, address the underlying issues by locating and resolving the bug causing the failures. This approach will help mitigate the problem and effectively reduce the volume of white-noise alerts, ensuring the alerting system remains focused on critical issues.

Alternatives

No response

Tasks

Preview Give feedback
  1. bug
    quiet-node
  2. bug
    quiet-node
  3. enhancement
    quiet-node