celestiaorg/celestia-core

Add new consensus summary metric to track missed blocks, deprecate missed blocks gauge.

jevonearth opened this issue · 0 comments

Feature Request

Summary

Introduce a new Prometheus counter metric named celestia_consensus_validator_missed_blocks_total and deprecate the existing gauge metric celestia_consensus_validator_missed_blocks to more accurately track changes over time.

Problem Definition

The current implementation uses a gauge type for the celestia_consensus_validator_missed_blocks metric. While gauges are useful for values that can increase and decrease, such as temperatures or amounts of free memory, they are not ideal for counting occurrences of events that only increase, such as missed blocks. Gauges do not inherently support tracking rates of increase or decrease without additional computation, which can lead to less efficient monitoring and potential inaccuracies in alerting or historical data analysis.

Including a counter metric for missed blocks would allow Prometheus to automatically handle rate calculations and more accurately reflect the operational health and performance trends of the validator. This change would align with Prometheus best practices.

Proposal

  1. Introduce a New Counter Metric: Implement celestia_consensus_validator_missed_blocks_total as a counter metric that increments each time a validator misses a block.

  2. Deprecate the Existing Gauge Metric: Mark celestia_consensus_validator_missed_blocks as deprecated in the codebase and documentation, encouraging users to transition to the new counter metric.

  3. Update metric descriptions: Revise the metric descriptions to explain the use of the new counter and the deprecation path for the existing gauge.

# HELP celestia_consensus_validator_missed_blocks (Deprecated) Total missed blocks for a validator. This metric is deprecated and will be removed in future versions. Please use celestia_consensus_validator_missed_blocks_total instead.
# TYPE celestia_consensus_validator_missed_blocks gauge
# HELP celestia_consensus_validator_missed_blocks_total Total number of blocks missed by the validator
# TYPE celestia_consensus_validator_missed_blocks_total counter
  1. Implementation Timeline: We (ECAD Labs) will submit a Pull Request with an implementation soon.