[DocDB] Table-level cumulative metrics could decrease due to split tablets being deleted
Opened this issue · 2 comments
Jira Link: DB-11975
Description
We have cumulative metrics for each RocksDB instance like for example rocksdb_number_db_seek
which we add up at table-level and then use rate function for dynamic visualisation.
As a result of dynamic tablet splitting, already split tablets are eventually deleted as they are fully replaced by their children. Corresponding parent RocksDB instances are shutdown and because of this their metrics are removed from table-level sum.
This leads to decreasing of table-level cumulative RocksDB metrics and this is shown as spikes by rate function and got interpreted incorrectly.
Issue Type
kind/enhancement
Warning: Please confirm that this issue does not contain any sensitive information
- I confirm this issue does not contain any sensitive information.
A tablet moving out of TServer or a restart of a TServer could also cause these metrics that otherwise monotonically go about to go down. But wonder why Grafana etc. display this as a positive spike instead of a negative spike is surprising. Is there a different flavor or rate function that would do the right thing?
BTW.. excellent debugging @ttyusupov !