Rufus stops running cron jobs for a while, and then starts again
jmanian opened this issue · 2 comments
Apologies in advance if this is the wrong forum for my issue. I'm happy to repost wherever is appropriate.
We use Rufus (3.5.2) in my app to run periodic jobs (as frequent as every 5 minutes and infrequent as weekly). Our app is a Rails app hosted on Heroku, but we don't use Rufus in the Rails process. Instead Rufus runs in a dedicated scheduler
process, and the only thing that actually happens in the scheduler
process is it initiates Sidekiq jobs (which means that it writes to redis).
Occasionally Rufus will suddenly stop executing the cron jobs for a couple hours, and then suddenly start again. When this happens there are no errors or anything else in our logs. It just goes silent for a while and then continues again later. Do you have any idea how this could be happening? Maybe there's something wrong with my setup? Maybe it's a bug in Rufus?
Heroku restarts each process roughly every 24 hours, but these blackouts don't necessarily happen anywhere near the Heroku restarts. For instance on Monday it went dark at 17:31 and then came back at 20:33, so just over 3 hours. The prior restart was at 15:38, almost 2 hours before, and it didn't restart again until the following day.
The way that I can tell when it stops executing is I have a log statement set up with * * * * *
, so when that log statement goes missing I know that Rufus has stopped. I can also see in the logs that the Sidekiq jobs are not being initiated.
Here's an abbreviated version of the code for our scheduler.
In our Procfile
:
scheduler: bundle exec rake scheduler:run
# scheduler.rake
namespace :scheduler do
task :run => :environment do
# on the command line in dev ctrl-C gives a SIGINT
# heroku sends a SIGTERM when it restarts
trap('SIGINT') { Scheduler.terminate }
trap('SIGTERM') { Scheduler.terminate }
Scheduler.run
end
end
# scheduler.rb
module Scheduler
JOB_SCHEDULES = {
# Every 5 minutes
'*/5 * * * *' => [
SidekiqJobClassA
],
# Every 15 minutes
'*/15 * * * *' => [
SidekiqJobClassB,
SidekiqJobClassC
],
# etc.
}
class << self
def run
# This is simplified, but it basically boils down to this
job_schedules.each do |cron_string, jobs|
# First there's a phase were we execute any jobs that were skipped while
# the scheduler was restarting, which we determine by pulling the last execution
# of each cron string from redis.
jobs.each { |job| job.perform_async } if last_execution_was_skipped
# Schedule occurrences going forward
scheduler.cron(cron_string) do
jobs.each { |job| job.perform_async } # Run each Sidekiq job, which means writing to redis
update_last_occurrence(cron_string, now) # Update the last execution of the cron string in redis
end
end
scheduler.cron '* * * * *' do
# So we can make sure the scheduler is doing its job
Rails.logger.info "scheduler heartbeat"
end
scheduler.join
end
def terminate
scheduler.shutdown(:wait)
end
private
def job_schedules
JOB_SCHEDULES
end
def scheduler
Rufus::Scheduler.singleton
end
end
end
My mistake!!!! This is not a Rufus issue.
As soon as I posted this I realized that the issue was something with our logging service that was making those logs disappear. Rufus appears to be working perfectly after all!