/rpqueue

Redis Priority Queue offers a priority/timeline based queue for use with Redis

Primary LanguagePythonGNU Lesser General Public License v2.1LGPL-2.1

Description

This package intends to offer a priority-based remote task queue solution using Redis as the transport and persistence layer, and JSON for a common interchange format.

Semantically, this module implements a 0/1 or 1+ queue with optional retries. That is, it attempts to execute every task once by default, or >1 manually, or >1 automatically with 'visibility timeouts'.

If a 'manual' retry task raises an exception, it will not automatically retry, but you can manually retry the task and specify the maximum attempts. Similarly, for tasks with visibility timeouts, if the task rasises an exception or doesn't complete, it will be retried up to the limit of retries provided.

See the Retries section below.

Full documentation is available: https://josiahcarlson.github.io/rpqueue/

Getting started

In order to execute tasks, you must ensure that rpqueue knows about your tasks that can be executed, you must configure rpqueue to connect to your Redis server, then you must start the task execution daemon:

from mytasks import usertasks1, usertasks2, ...
import rpqueue

rpqueue.set_redis_connection_settings(host, port, db)
rpqueue.execute_tasks()

Alternatively, rpqueue offers a command-line interface to do the same, though you must provide the name of a module or package that imports all modules or packages that define tasks that you want to run. For example:

# tasks.py
from tasks import accounting, cleanup, ...
# any other imports or configuration necessary, put them here

# run from the command-line
python -m rpqueue.run --module=tasks --host=... --port=... --db=...

Example uses

Say that you have a module usertasks1 with a task to be executed called echo_to_stdout. Your module may look like the following:

from rpqueue import task

@task
def echo_to_stdout(message):
    print(message)

To call the above task, you would use:

echo_to_stdout.execute(...)
echo_to_stdout.execute(..., delay=delay_in_seconds)

You can also schedule a task to be repeatedly executed with the periodic_task decorator:

@periodic_task(25, queue="low")
def function1():
    # Will be executed every 25 seconds from within the 'low' queue.
    pass

Retries

Tasks may be provided an optional attempts argument, which specifies the total number of times the task will try to be executed before failing. By default, all tasks have attempts set at 1, unless otherwise specified:

@task(attempts=3)
def fail_until_zero(value, **kwargs):
    try:
        if value != 0:
            value -= 1
            raise Exception
    except:
        fail_until_zero.retry(value, **kwargs)
    else:
        print "succeeded"

If passed the value 3, "succeeded" will never be printed. Why? The first try has value=3, attempts=3, and fails. The second pass has value=2, attempts=2, and fails. The third pass has value=1, attempts=1, fails, and the retry returns without retrying. The attempts value is the total number of attempts, including the first, and all retries.

Automatic retries with vis_timeout

Included with rpqueue 0.30.0 or later, you can give tasks (and now data queues) a visibility timeout, which is (per Amazon SQS-style semantics) a time for how long the task has to execute correctly before being automatically re-entered into the queue.:

@task(attempts=20, vis_timeout=5, use_dead=False)
def usually_eventually_succeed(**kwargs):
    # (4/5)**20  is ~ 0.0115, so call chain fails about 1% of the time
    if not random.randrange(5):
        return "done!"

    time.sleep(6) # fail silently

Deadletter task queue

If you would like to know which tasks failed, failed calls can be automatically entered into a deadletter queue.:

@rpqueue.task(attempts=5, vis_timeout=5, use_dead=True)
def fails_to_dead(**kwargs):
    # (4/5)**5  is 0.32768, so call chain fails about 33% of the time
    if not random.randrange(5):
        return "done!"

    time.sleep(6) # fail silently

task_deadletter = rpqueue.Data(rpqueue.DEADLETTER_QUEUE, is_tasks=True)
dead_tasks = task_deadletter.get_data(items=5)

See help(rpqueue.Data) for more.

Waiting for task execution

As of version .19, RPQueue offers the ability to wait on a task until it begins execution:

@task
def my_task(args):
    # do something

executing_task = my_task.execute()
if executing_task.wait(5):
    # task is either being executed, or it is done
else:
    # task has not started execution yet

With the ability to wait for a task to complete, you can have the ability to add deadlines by inserting a call to executing_task.cancel() in the else block above.

Automatically storing results of tasks

As of version .19, RPQueue offers the ability to store the result returned by a task as it completes:

@task(save_results=30)
def task_with_results():
    return 5

etask = task_with_results.execute()
if etask.wait(5):
    print etask.result # should print 5

The save_results argument can be passed to tasks, periodic tasks, and even cron tasks (described below). The value passed will be how long the result is stored in Redis, in seconds. All results must be json-encodable.

Additional features

Crontab

Support for cron_tasks using a crontab-like syntax requires the Python crontab module: http://pypi.python.org/pypi/crontab/ , allowing for:

@cron_task('0 5 tue * *')
def function2():
    # Will be executed every Tuesday at 5AM.
    pass

Data queues

Put data in queues, not tasks. I mean, should have probably been here from the start, but it's here now.

Convenient features:
  • 1-1000 data items per read, at your discretion
  • vis_timeout
  • attempts
  • use_dead
  • refresh data if you want to keep working on it (we don't identify the reader, so you should use an explicit lock if you want guaranteed exclusivity)

A few examples:

# 0/1 queue
dq = rpqueue.Data('best_effort')
dq.put_data([item1, item2, item3, ...])
items = dq.get_data(2) # {<uuid>: <item>, ...}

# Up to 5 deliveries, with 5 second delay before re-insertion
dq5 = rpqueue.Data('retry_processing', attempts=5, vis_timeout=5)
dq5.put_data([item1, item2, item3, ...])
items = dq5.get_data(2) # {<uuid>: <item>, ...}
items2 = dq5.get_data(2, vis_timeout=20) # override timeout on read
refreshed = set(dq5.refresh_data(items, vis_timeout=7)) # refresh our lock
items = {k:v for k,v in items if k in refreshed}
dq5.done_data(items)
dq5.done_data(items2)

# Up to 1 try with a 5 second delay before insertion into deadletter queue
dqd = rpqueue.Data('retry_processing', attempts=1, vis_timeout=5, use_dead=True)
dqd.put_data([item1, item2, item3, ...])
items = dqd.get_data(2) # {<uuid>: <item>, ...}
items2 = dqd.get_data(2, vis_timeout=20) # override timeout on read
refreshed = set(dqd.refresh_data(items, vis_timeout=7)) # refresh our lock
items = {k:v for k,v in items if k in refreshed}
dqd.done_data(items)
time.sleep(20)
# items2 are now "dead"
dead = rpqueue.Data(rpqueue.DEADLETTER_QUEUE)
dead_items = dead.get_data(2) # these have a different format, see docs!

A longer example closer to what would be seen in practice:

aggregate_queue = rpqueue.Data("aggregate_stats", vis_timeout=30, use_dead=False)

@rpqueue.periodic_task(60)
def aggregate():
    # If vis_timeout is not provided, will use the queue default.
    # If vis_timeout is <= 0, will act as a 0/1 queue, and later "done data"
    # calling is unnecessary.
    data = aggregate_queue.get_data(items=100, vis_timeout=5)
    # data is a dictionary: {<uuid>: <item>, <uuid>: <item>, ...}
    # do something with data
    done_with = []
    for id, value in data.items():
        # do something with value
        done_with.append(id)

    aggregate_queue.refresh_data(data) # still working!

    # You can pass any iterator that naturally iterates over the uuids you
    # want to be "done" with.
    aggregate_queue.done_data(done_with)
    # also okay:
    # aggregate_queue.done_data(data)
    # aggregate_queue.done_data(tuple(data))
    # aggregate_queue.done_data(list(data))

Sponsors

Don't like LGPL? Sponsor the project and get almost any license you want.

This project has been partly sponsored by structd.com and hCaptcha.com, both of whom received licenses that match their needs appropriately. Historically, rpqueue has been used to help support the delivery of millions of food orders at chownow.com, billions of ad impressions for system1.com, and billions of captchas for hCaptcha.com.

Thank you to our sponsors and those who have consumed our services.

You are welcome for the good service.

Your company link here.