riverqueue/river

Monitoring queue support

mihaioprescu opened this issue · 2 comments

I've integrated a queue in my service and i would like to know if there is a way of monitoring the jobs in that queue, so i can decide if some alert needs to be generated because too many jobs are failing in a specific interval of time.

At the moment I found now way of retrieving a list of the jobs that are in a specific state and have been finalized in the past X minutes ( just an example ).

Could some support be added for that ? Is there any way i could work around this using the JobList() func ?

@mihaioprescu Check out subscriptions:

https://riverqueue.com/docs/subscriptions

What you could do is listen on the event EventKindJobFailed, observe the properties on failing jobs (e.g. if the new state is discarded) and/or the rate of failure, and then emit alerts based on that.

Alternatively, you could list jobs with state finalized and order by finalized_at descending, with a large limit to First, then filter based on finalized_at by iterating rows in code. It's true that you can't specifically filter based on finalized_at > ?, but the general approach will still work.

cc @bgentry Think there's other ways to solve this one, but a request for arbitrary filtering conditions for job listing was somewhat inevitable, and this won't be the last.

I would strongly recommend against attempting to do this by listing the jobs table repeatedly. You will likely run into problems with this approach if you have any sort of scale.

Instead, this is the type of problem that telemetry is meant to solve. With a good telemetry system (like OpenTelemetry) you should be able to scale this approach far beyond River’s own throughout limitations.

You can construct this using the subscription primitive @brandur shared above. I think we also have some work to do to make this all easier to set up, hoping we’ll get into that soon.

I’m going to convert this to a discussion for now until we have specific issues to track around this (things that aren’t supported or need better UX).