Report job type (e.g. module name) in Telemetry events
soundmonster opened this issue · 6 comments
I've been using Rihanna in production for a while now and would like to thank the maintainers for the great work and a very useful, slim system.
We're subscribing to the Telemetry events emitted by Rihanna and export them to a dashboard. Our system schedules diffent types of jobs, and it would be very useful to differentiate between these types in the dashboard. For this to work, I see three options:
- Use the term:
- Job module
- Module/function (arguments optional)
- Send the term as is and do post-processing in the subscriber
- Use an additional, free-form JSON field for telemetry metadata with the job
- Don't change anything in Rihanna. Instead, let the Telemetry subscriber pull out the metadata from Postgres (can only work for the events that contain a
job_id
, see below). IMHO this option is suboptimal in terms of separation of concerns: it requires the subscriber (concern: instrumentation) to have Rihanna as a hard dependency (leaking concern: how does Rihanna store jobs?).
Rihanna already provides support for telemetry for the following events:
enqueued
deleted
locked
(count
only)succeeded
failed
reenqueued
retried
released
Most of the events report a count of 1 and the numeric job ID, except:
deleted
sometimes will report a count > 1 and hence no id, andlocked
only reports the count
I would like to help implementing this, and will highly appreciate any input from the maintainer team.
@soundmonster We're all pretty busy these days, but the best thing for you to do is open a PR and we'll make comments and help you to get it shipshape 👍
Thank you for your response! I’ll try to cook up something in the next few weeks. Also, stay safe, everyone.
Just took a quick look at this, and some of these would be quite difficult to do as we only have the job_id and don't perform an additional query to pull the job out of the DB to report it.
I think it should be easy to add to enqueued
and delete for a single job id.
It might be possible to get it for locked
, but it would end up being a list of terms or something. Or, we would have to change to doing a single call for locked
for each individual job which would change how people are using it. This could maybe be done with a new key instead of locked
.
Looks like, in many cases, we could update the JobDispatcher
to return the job. It seems the job id is most often pulled out of a full job struct/map.
@samsondav @soundmonster Do we think "breaking" the telemetry is a good idea to track individual jobs events for locked? Looking here: https://github.com/samsondav/rihanna/blob/master/lib/rihanna/job.ex#L425
Though, I guess if we just split locked
to individual call for each job and still use a count: 1
, the sum of them should still be the same. So maybe that doesn't break the api …