samsondav/rihanna

Report job type (e.g. module name) in Telemetry events

soundmonster opened this issue · 6 comments

I've been using Rihanna in production for a while now and would like to thank the maintainers for the great work and a very useful, slim system.

We're subscribing to the Telemetry events emitted by Rihanna and export them to a dashboard. Our system schedules diffent types of jobs, and it would be very useful to differentiate between these types in the dashboard. For this to work, I see three options:

  • Use the term:
    • Job module
    • Module/function (arguments optional)
    • Send the term as is and do post-processing in the subscriber
  • Use an additional, free-form JSON field for telemetry metadata with the job
  • Don't change anything in Rihanna. Instead, let the Telemetry subscriber pull out the metadata from Postgres (can only work for the events that contain a job_id, see below). IMHO this option is suboptimal in terms of separation of concerns: it requires the subscriber (concern: instrumentation) to have Rihanna as a hard dependency (leaking concern: how does Rihanna store jobs?).

Rihanna already provides support for telemetry for the following events:

  • enqueued
  • deleted
  • locked (count only)
  • succeeded
  • failed
  • reenqueued
  • retried
  • released

Most of the events report a count of 1 and the numeric job ID, except:

  • deleted sometimes will report a count > 1 and hence no id, and
  • locked only reports the count

I would like to help implementing this, and will highly appreciate any input from the maintainer team.

@soundmonster We're all pretty busy these days, but the best thing for you to do is open a PR and we'll make comments and help you to get it shipshape 👍

Thank you for your response! I’ll try to cook up something in the next few weeks. Also, stay safe, everyone.

Just took a quick look at this, and some of these would be quite difficult to do as we only have the job_id and don't perform an additional query to pull the job out of the DB to report it.

I think it should be easy to add to enqueued and delete for a single job id.

It might be possible to get it for locked, but it would end up being a list of terms or something. Or, we would have to change to doing a single call for locked for each individual job which would change how people are using it. This could maybe be done with a new key instead of locked.

Looks like, in many cases, we could update the JobDispatcher to return the job. It seems the job id is most often pulled out of a full job struct/map.

@samsondav @soundmonster Do we think "breaking" the telemetry is a good idea to track individual jobs events for locked? Looking here: https://github.com/samsondav/rihanna/blob/master/lib/rihanna/job.ex#L425

Though, I guess if we just split locked to individual call for each job and still use a count: 1, the sum of them should still be the same. So maybe that doesn't break the api …