fatkodima/sidekiq-iteration

Pull into Sidekiq core?

Closed this issue · 6 comments

Hey @fatkodima, would you be interested in integrating this functionality into Sidekiq core for 7.3 or have me do it? I've had several customers report this gem as very useful for solving their problems with long-running jobs, making deployments quicker and safer, etc. I think it's a good pattern/API to encourage people to use.

Hey! Wow, thats awesome to get this merged into sidekiq itself!

I will try to do that on this weekend (or next weekend) and see how it goes. Let me know if you have plans to release 7.3 sooner.

I have a 7.3 milestone targeting a summer release. 7.2.3 will be out very soon.

Wanted to ask, what API would you prefer?

  1. (my preference)
class MyJob
  include Sidekiq::Job
  include Sidekiq::Iteration
end

or something like
2.

class MyJob
  include Sidekiq::Job
  sidekiq_options iteration: true, ...
end

And what API would you prefer for throttling (https://github.com/fatkodima/sidekiq-iteration/blob/master/guides/throttling.md)? Currently it is configured via a top level call in the class' body.

I'd probably go with:

class SomeJob
  include Sidekiq::Job
  include Sidekiq::Job::Iterable

  sidekiq_options iteration: { whatever: 123 }
end

Unlike Rails, I dislike top-level class methods like throttle_on as they can be hard to test and mock. I would prefer that be an instance method, server middleware provides an instance:

class ThrottleMiddleware
  include Sidekiq::ServerMiddleware

  def call(instance, job, queue)
    if instance.throttle_on?
      # do something
    end
  end
end

As suggestion @mperham, I feel like the framework should be pulled into Sidekiq but not the concrete implementations.

AR can be suggested to be used as I reported on #9:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.in_batches(start: cursor) do |relation|
      yielder.yield(relation, relation.maximum(:id))
    end
  end
end

def each_iteration(relation)
  relation.update_all(...)
end

Or for batches:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.find_in_batches(start: cursor) do |batch|
      yielder.yield(batch, batch.last.id)
    end
  end
end

def each_iteration(batch)
  batch.each { ... }
end

Or for individual records:

def build_enumerator(cursor:)
  Enumerator.new do |yielder|
    MyModel.find_each(start: cursor) do |record|
      yielder.yield(record, record.id)
    end
  end
end

def each_iteration(record)
  record.update(...)
end

Feels like having the CSV, Array and AR may be too much, I'm not sure, just throwing ideas out here.

Having optimized support for a few well known types/libraries is useful but we should have generic Enumerable support too.