Improve Metrics and Dashboard
ryanolson opened this issue · 0 comments
ryanolson commented
Possible Metrics and Status
- batches / second (counter/rate)
- inference / second (counter/rate)
- gpu power (gauge)
- queue depth (gauge)
- request time (summary quantile 50/90/99)
- compute time (summary quantile 50/90/99)
- load_ratio [request time / compute_time] (histogram: buckets [2, 4, 10, 100, 1000])
Grafana panels needs serious work. Anyone have a good way to visualize Prometheus histograms with Grafana?