/resque-scheduler

A light-weight job scheduling system built on top of resque

Primary LanguageRubyMIT LicenseMIT

resque-scheduler

Description

Resque-scheduler is an extension to Resque that adds support for queueing items in the future.

Requires redis >=1.3.

Job scheduling is supported in two different way: Recurring (scheduled) and Delayed.

Scheduled jobs are like cron jobs, recurring on a regular basis. Delayed jobs are resque jobs that you want to run at some point in the future. The syntax is pretty explanatory:

Resque.enqueue_in(5.days, SendFollowupEmail) # run a job in 5 days
# or
Resque.enqueue_at(5.days.from_now, SomeJob) # run SomeJob at a specific time

Documentation

This README covers what most people need to know. If you're looking for details on individual methods, you might want to try the rdoc.

Installation

To install:

gem install resque-scheduler

Adding the resque:scheduler rake task:

require 'resque_scheduler/tasks'    

There are three things resque-scheduler needs to know about in order to do it's jobs: the schedule, where redis lives, and which queues to use. The easiest way to configure these things is via the rake task. By default, resque-scheduler depends on the "resque:setup" rake task. Since you probably already have this task, lets just p resque-scheduler pretty much needs to know everything resque needs to know, let's just put our configuration there.

# Resque tasks
require 'resque/tasks'
require 'resque_scheduler/tasks'    

namespace :resque do
  task :setup do
    require 'resque'
    require 'resque_scheduler'
    require 'resque/scheduler'      
  
    # you probably already have this somewhere
    Resque.redis = 'localhost:6379'
    
    # The schedule doesn't need to be stored in a YAML, it just needs to
    # be a hash.  YAML is usually the easiest.
    Resque.schedule = YAML.load_file('your_resque_schedule.yml')
    
    # If your schedule already has +queue+ set for each job, you don't
    # need to require your jobs.  This can be an advantage since it's
    # less code that resque-scheduler needs to know about. But in a small
    # project, it's usually easier to just include you job classes here.
    # So, someting like this:
    require 'jobs'
    
    # If you want to be able to dynamically change the schedule,
    # uncomment this line.  A dynamic schedule can be updated via the
    # Resque::Scheduler.set_schedule (and remove_schedule) methods.
    # When dynamic is set to true, the scheduler process looks for 
    # schedule changes and applies them on the fly.
    # Note: This feature is only available in >=2.0.0.
    #Resque::Scheduler.dynamic = true
  end
end

The scheduler process is just a rake task which is responsible for both queueing items from the schedule and polling the delayed queue for items ready to be pushed on to the work queues. For obvious reasons, this process never exits.

$ rake resque:scheduler 

Supported environment variables are VERBOSE and MUTE. If either is set to any nonempty value, they will take effect. VERBOSE simply dumps more output to stdout. MUTE does the opposite and silences all output. MUTE supersedes VERBOSE.

NOTE: You DO NOT want to run >1 instance of the scheduler. Doing so will result in the same job being queued more than once. You only need one instnace of the scheduler running per resque instance (regardless of number of machines).

If the scheduler process goes down for whatever reason, the delayed items that should have fired during the outage will fire once the scheduler process is started back up again (regardless of it being on a new machine). Missed scheduled jobs, however, will not fire upon recovery of the scheduler process.

Delayed jobs

Delayed jobs are one-off jobs that you want to be put into a queue at some point in the future. The classic example is sending email:

Resque.enqueue_in(5.days, SendFollowUpEmail, :user_id => current_user.id)

This will store the job for 5 days in the resque delayed queue at which time the scheduler process will pull it from the delayed queue and put it in the appropriate work queue for the given job and it will be processed as soon as a worker is available (just like any other resque job).

NOTE: The job does not fire exactly at the time supplied. Rather, once that time is in the past, the job moves from the delayed queue to the actual resque work queue and will be completed as workers as free to process it.

Also supported is Resque.enqueue_at which takes a timestamp to queue the job, and Resque.enqueue_at_with_queue which takes both a timestamp and a queue name.

The delayed queue is stored in redis and is persisted in the same way the standard resque jobs are persisted (redis writing to disk). Delayed jobs differ from scheduled jobs in that if your scheduler process is down or workers are down when a particular job is supposed to be queue, they will simply "catch up" once they are started again. Jobs are guaranteed to run (provided they make it into the delayed queue) after their given queue_at time has passed.

One other thing to note is that insertion into the delayed queue is O(log(n)) since the jobs are stored in a redis sorted set (zset). I can't imagine this being an issue for someone since redis is stupidly fast even at log(n), but full disclosure is always best.

Removing Delayed jobs

If you have the need to cancel a delayed job, you can do like so:

# after you've enqueued a job like:
Resque.enqueue_at(5.days.from_now, SendFollowUpEmail, :user_id => current_user.id)
# remove the job with exactly the same parameters:
Resque.remove_delayed(SendFollowUpEmail, :user_id => current_user.id)

Scheduled Jobs (Recurring Jobs)

Scheduled (or recurring) jobs are logically no different than a standard cron job. They are jobs that run based on a fixed schedule which is set at startup.

The schedule is a list of Resque worker classes with arguments and a schedule frequency (in crontab syntax). The schedule is just a hash, but is most likely stored in a YAML like so:

queue_documents_for_indexing:
  cron: "0 0 * * *"
  class: QueueDocuments
  queue: high
  args: 
  description: "This job queues all content for indexing in solr"

clear_leaderboards_contributors:
  cron: "30 6 * * 1"
  class: ClearLeaderboards
  queue: low
  args: contributors
  description: "This job resets the weekly leaderboard for contributions"

The queue value is optional, but if left unspecified resque-scheduler will attempt to get the queue from the job class, which means it needs to be defined. If you're getting "uninitialized constant" errors, you probably need to either set the queue in the schedule or require your jobs in your "resque:setup" rake task.

NOTE: Six parameter cron's are also supported (as they supported by rufus-scheduler which powers the resque-scheduler process). This allows you to schedule jobs per second (ie: "30 * * * * *" would fire a job every 30 seconds past the minute).

A big shout out to rufus-scheduler for handling the heavy lifting of the actual scheduling engine.

Support for resque-status (and other custom jobs)

Some Resque extensions like resque-status use custom job classes with a slightly different API signature. Resque-scheduler isn't trying to support all existing and future custom job classes, instead it supports a schedule flag so you can extend your custom class and make it support scheduled job.

Let's pretend we have a JobWithStatus class called FakeLeaderboard

	class FakeLeaderboard < Resque::JobWithStatus
		def perform
			# do something and keep track of the status
		end
	end

And then a schedule:

create_fake_leaderboards:
  cron: "30 6 * * 1"
  queue: scoring
  custom_job_class: FakeLeaderboard
  args: 
  rails_env: demo
  description: "This job will auto-create leaderboards for our online demo and the status will update as the worker makes progress"

If your extension doesn't support scheduled job, you would need to extend the custom job class to support the #scheduled method:

module Resque
  class JobWithStatus
    # Wrapper API to forward a Resque::Job creation API call into
    # a JobWithStatus call.
    def self.scheduled(queue, klass, *args)
      create(*args)
    end
  end
end

resque-web Additions

Resque-scheduler also adds to tabs to the resque-web UI. One is for viewing (and manually queueing) the schedule and one is for viewing pending jobs in the delayed queue.

The Schedule tab:

The Schedule Tab

The Delayed tab:

The Delayed Tab

To get these to show up you need to pass a file to resque-web to tell it to include the resque-scheduler plugin. Unless you're running redis on localhost, you probably already have this file. It probably looks something like this:

require 'resque' # include resque so we can configure it
Resque.redis = "redis_server:6379" # tell Resque where redis lives

Now, you want to add the following:

# This will make the tabs show up.
require 'resque_scheduler'

As of resque-scheduler 2.0, it's no longer necessary to have the resque-web process aware of the schedule because it reads it from redis. But prior to 2.0, you'll want to make sure you load the schedule in this file as well. Something like this:

Resque.schedule = YAML.load_file(File.join(RAILS_ROOT, 'config/resque_schedule.yml')) # load the schedule

Now make sure you're passing that file to resque-web like so:

resque-web ~/yourapp/config/resque_config.rb

That should make the scheduler tabs show up in resque-web.

Plagiarism alert

This was intended to be an extension to resque and so resulted in a lot of the code looking very similar to resque, particularly in resque-web and the views. I wanted it to be similar enough that someone familiar with resque could easily work on resque-scheduler.

Contributing

For bugs or suggestions, please just open an issue in github.

Patches are always welcome.