oban-bg/oban

Check period for given timestamp while checking for uniqueness

Closed this issue · 6 comments

Is your feature request related to a problem? Please describe.

We use Oban to make our entities transition from one state to another in background.

Such Jobs can be scheduled system-wide or per entity, but in both cases we would like to rely on oban unique feature to ensure that our system is not scheduling the same job twice and that we're not overloading our system with too many jobs.

To allow for that to happen we initially though that we can just configure period.
But upon closer inspection we found out that period is checked in relation to the job's inserted_at timestamp, which is not what we want.

Describe the Solution You'd Like

Either of the following would work for us, but IMO the second option adds more flexibility:

  1. Allow for period to be checked in relation to the job's scheduled_at timestamp, e.g.:

    use Oban.Worker,
      unique: [
        period: [
          scheduled_at: 60, 
          inserted_at: 10,
          # ...
          ]
      ]
  2. Allow for dynamic options definition, e.g.:

    use Oban.Worker,
      unique: &unique/1
    
    
    def unique(job) do
      from(j in Oban.Job,
        where: j.scheduled_at <= ^DateTime.add(job.scheduled_at, 60, :second),
        where: j.scheduled_at >= ^DateTime.add(job.scheduled_at, -60, :second),
      )
    

Describe Alternatives You've Considered

It is possible to achieve described above behaviour by disabling Oban's uniqueness check and implementing it in the application code, but it would be nice to have it out of the box.

This is a legacy decision that I've regretted and wanted to fix for a long time. What do you think about an additional unique field that allows controlling the timestamp it considers?

use Oban.Worker, unique: [period: 60, timestamp: :scheduled_at]

The default timestamp would remain :inserted_at for backward compatibility, and we could eventually switch to scheduled_at in a major version bump.

Going about it this way limits the solution to handle only one timestamp.

Backwards compatibility of the variant I proposed could be achieved in a following way – period's type is list({field :: atom(), period} | period), where period: non_neg_integer() | :infinity. Thus, we can assume that if field is not specified it shall be processed as period of inserted_at, e. g.: period: [30, scheduled_at: 60] is period: [inserted_at: 30, scheduled_at: 60].

Btw, I already implemented it that way for the project I was referring to, and there is another detail – nil shall be valid unique option when defining worker.

Going about it this way limits the solution to handle only one timestamp.

The goal is only to handle one timestamp. The logic around unique checks is complex enough; extending it to check multiple timestamps would make it even more opaque.

It's always possible to support custom unique checks with multiple timestamps in your own application if necessary. However, that isn't a direction I'm willing to go for the core unique options.

and there is another detail – nil shall be valid unique option when defining worker.

Why use unique: nil instead of omitting unique altogether?

The goal is only to handle one timestamp. The logic around unique checks is complex enough; extending it to check multiple timestamps would make it even more opaque.

Ah ok. Then I'll rewrite it to handle only one.

It's always possible to support custom unique checks with multiple timestamps in your own application if necessary. However, that isn't a direction I'm willing to go for the core unique options.

IMO it'll be nice to be able to define it in the worker itself, however there might be possible mix up of storage- and logic-level code.

Why use unique: nil instead of omitting unique altogether?

Indeed, seems I missed that option. The only reason left is for clarity.

https://github.com/sorentwo/oban/blob/de3882332fa3b506076e707e1859e826495dc1c4/lib/oban/engines/basic.ex#L416

Shouldn't uniqueness check be performed relative to the field rather than now?

PS Just switched from my fork to 2.16, and noticed that my jobs do not pass the check.

The only change in that function was to dynamically select the field to check (either inserted_at or scheduled_at):

-  defp since_period(query, period) do
-    where(query, [j], j.inserted_at >= ^seconds_from_now(-period))
+  defp since_period(query, period, timestamp) do
+    where(query, [j], field(j, ^timestamp) >= ^seconds_from_now(-period))