jmettraux/rufus-scheduler

Rufus stops running cron jobs for a while, and then starts again

jmanian opened this issue · 2 comments

Apologies in advance if this is the wrong forum for my issue. I'm happy to repost wherever is appropriate.

We use Rufus (3.5.2) in my app to run periodic jobs (as frequent as every 5 minutes and infrequent as weekly). Our app is a Rails app hosted on Heroku, but we don't use Rufus in the Rails process. Instead Rufus runs in a dedicated scheduler process, and the only thing that actually happens in the scheduler process is it initiates Sidekiq jobs (which means that it writes to redis).

Occasionally Rufus will suddenly stop executing the cron jobs for a couple hours, and then suddenly start again. When this happens there are no errors or anything else in our logs. It just goes silent for a while and then continues again later. Do you have any idea how this could be happening? Maybe there's something wrong with my setup? Maybe it's a bug in Rufus?

Heroku restarts each process roughly every 24 hours, but these blackouts don't necessarily happen anywhere near the Heroku restarts. For instance on Monday it went dark at 17:31 and then came back at 20:33, so just over 3 hours. The prior restart was at 15:38, almost 2 hours before, and it didn't restart again until the following day.

The way that I can tell when it stops executing is I have a log statement set up with * * * * *, so when that log statement goes missing I know that Rufus has stopped. I can also see in the logs that the Sidekiq jobs are not being initiated.


Here's an abbreviated version of the code for our scheduler.

In our Procfile:

scheduler: bundle exec rake scheduler:run
# scheduler.rake
namespace :scheduler do
  task :run => :environment do
    # on the command line in dev ctrl-C gives a SIGINT
    # heroku sends a SIGTERM when it restarts
    trap('SIGINT') { Scheduler.terminate }
    trap('SIGTERM') { Scheduler.terminate }
    Scheduler.run
  end
end
# scheduler.rb
module Scheduler

  JOB_SCHEDULES = {
    # Every 5 minutes
    '*/5 * * * *' => [
      SidekiqJobClassA
    ],
    # Every 15 minutes
    '*/15 * * * *' => [
      SidekiqJobClassB,
      SidekiqJobClassC
    ],
    # etc.
  }

  class << self
    def run
      # This is simplified, but it basically boils down to this

      job_schedules.each do |cron_string, jobs|
        # First there's a phase were we execute any jobs that were skipped while
        # the scheduler was restarting, which we determine by pulling the last execution
        # of each cron string from redis.
        jobs.each { |job| job.perform_async } if last_execution_was_skipped

        # Schedule occurrences going forward
        scheduler.cron(cron_string) do
          jobs.each { |job| job.perform_async } # Run each Sidekiq job, which means writing to redis
          update_last_occurrence(cron_string, now) # Update the last execution of the cron string in redis
        end
      end

      scheduler.cron '* * * * *' do
        # So we can make sure the scheduler is doing its job
        Rails.logger.info "scheduler heartbeat"
      end

      scheduler.join
    end

    def terminate
      scheduler.shutdown(:wait)
    end

    private

    def job_schedules
      JOB_SCHEDULES
    end

    def scheduler
      Rufus::Scheduler.singleton
    end
  end
end

My mistake!!!! This is not a Rufus issue.

As soon as I posted this I realized that the issue was something with our logging service that was making those logs disappear. Rufus appears to be working perfectly after all!

@jmanian no worries. I am glad you sorted it out. Thanks for sharing the investigation details!