onflow/flow-archive

Add metrics to Archive Node for performance monitoring and alerting

Closed this issue · 0 comments

Problem

DPS's monitoring is based on log queries and there are some operational/performace issues that can crop up and not be detected unless someone proactively looks at the logs

Solution

  1. Create a wish list of metrics needed for operating DPS. Suggested metrics: APi endpoint metrics, block height indexed, block indexing rate, latency
  2. Create basic dashboard on grafana to monitor DPS, along with some basic alarms
  3. Investigate logging granularity and see if that's all we need to diagnose and debug at the info level