Add new consensus summary metric to track missed blocks, deprecate missed blocks gauge.
jevonearth opened this issue · 0 comments
Feature Request
Summary
Introduce a new Prometheus counter metric named celestia_consensus_validator_missed_blocks_total
and deprecate the existing gauge metric celestia_consensus_validator_missed_blocks
to more accurately track changes over time.
Problem Definition
The current implementation uses a gauge type for the celestia_consensus_validator_missed_blocks
metric. While gauges are useful for values that can increase and decrease, such as temperatures or amounts of free memory, they are not ideal for counting occurrences of events that only increase, such as missed blocks. Gauges do not inherently support tracking rates of increase or decrease without additional computation, which can lead to less efficient monitoring and potential inaccuracies in alerting or historical data analysis.
Including a counter metric for missed blocks would allow Prometheus to automatically handle rate calculations and more accurately reflect the operational health and performance trends of the validator. This change would align with Prometheus best practices.
Proposal
-
Introduce a New Counter Metric: Implement
celestia_consensus_validator_missed_blocks_total
as a counter metric that increments each time a validator misses a block. -
Deprecate the Existing Gauge Metric: Mark
celestia_consensus_validator_missed_blocks
as deprecated in the codebase and documentation, encouraging users to transition to the new counter metric. -
Update metric descriptions: Revise the metric descriptions to explain the use of the new counter and the deprecation path for the existing gauge.
# HELP celestia_consensus_validator_missed_blocks (Deprecated) Total missed blocks for a validator. This metric is deprecated and will be removed in future versions. Please use celestia_consensus_validator_missed_blocks_total instead.
# TYPE celestia_consensus_validator_missed_blocks gauge
# HELP celestia_consensus_validator_missed_blocks_total Total number of blocks missed by the validator
# TYPE celestia_consensus_validator_missed_blocks_total counter
- Implementation Timeline: We (ECAD Labs) will submit a Pull Request with an implementation soon.