yugabyte/yugabyte-db

[DocDB] Table-level cumulative metrics could decrease due to split tablets being deleted

Opened this issue · 2 comments

Jira Link: DB-11975

Description

We have cumulative metrics for each RocksDB instance like for example rocksdb_number_db_seek which we add up at table-level and then use rate function for dynamic visualisation.
As a result of dynamic tablet splitting, already split tablets are eventually deleted as they are fully replaced by their children. Corresponding parent RocksDB instances are shutdown and because of this their metrics are removed from table-level sum.
This leads to decreasing of table-level cumulative RocksDB metrics and this is shown as spikes by rate function and got interpreted incorrectly.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.

A tablet moving out of TServer or a restart of a TServer could also cause these metrics that otherwise monotonically go about to go down. But wonder why Grafana etc. display this as a positive spike instead of a negative spike is surprising. Is there a different flavor or rate function that would do the right thing?

BTW.. excellent debugging @ttyusupov !