[feature request] OpenTelemetry endpoint in the backend
erdnaxe opened this issue · 1 comments
erdnaxe commented
This needs discussion as we mostly don't want to bloat Tulip.
It would be nice to have a /metrics
endpoint in the backend following the OpenTelemetry format.
This could allow teams to monitor their instance and be alerted when something very wrong is happening (before the scoreboard).
Metrics wishlist:
- (Counter per service) Total count of TCP flows in the MongoDB
- (Counter per service) Total size in bytes of all payloads in the MongoDB
- (Counter per service) Total amount of FLAG OUT / IN
- (Counter) Total amount of backend API requests
- (Gauge) Average duration of backend API response time
I don't believe we should expose per-TCP flow information as the Tulip frontend is already made for that.
ItsShadowCone commented
massive +1
i would go further than just "health" checks, i agree with no per-TCP flow information, but we should also group by:
- relevant data per-tick, probably also per-service, maybe rolling counters so it can be properly ingested into time-series database
- relevant data per-tag, per-service and optionally also per-tick. not just flag in and out.