Monitor and track the status of snapshots
liorfranko opened this issue · 4 comments
Hi,
Can you expose metrics that show the status of snapshots?
We want to create a dashboard and alerts to make sure snapshots don't fail.
Thanks,
Thanks for the request! What kind of metrics would you want to show that aren't available via kubectl? You can see the snapshot status that way
Using kubectl is nice, but I want to set alerts and not check them manually.
Example of metrics:
Number of snapshots
Status of each snapshot
If they're ready or not
Age of snapshot
This reminds me of https://kubernetes.io/blog/2021/04/16/volume-health-monitoring-alpha-update/.
However, I think for the controller the most sensible thing to add are Prometheus metrics for things like a snapshot failing to create, number of active create/restore processes, total number of PVCs and snapshots managed by the controller.
Can we re-open this? Having metrics to understand that the Gemini controller is working and that our Gemini resources are valid (ie, point to real PVCs) is critical. I can't see any current status output on the SnapshotGroup
resource that we can use to get an indication of whether or not the controller is working and the configuration is valid.