/turnstile

A distributed rate limiting WSGI middleware.

Primary LanguagePythonApache License 2.0Apache-2.0

Turnstile Distributed Rate-Limiting Middleware

Turnstile is a piece of WSGI middleware that performs true distributed rate-limiting. System administrators can run an API on multiple nodes, then place this middleware in the pipeline prior to the application. Turnstile uses a Redis database to track the rate at which users are hitting the API, and can then apply configured rate limits, even if each request was made against a different API node.

Installing Turnstile

Turnstile can be easily installed like many Python packages, using PIP:

pip install turnstile

You can install the dependencies required by Turnstile by issuing the following command:

pip install -r .requires

From within your Turnstile source directory.

If you would like to run the tests, you can install the additional test dependencies in the same way:

pip install -r .test-requires

Then, to run the test suite, use:

nosetests -v

Alternatively, it is possible to run the full test suite using a virtual environment using the tox tool; this is the recommended way for developers to run the test suite. Four environments are defined: "py26" and "py27" run the tests under Python 2.6 and Python 2.7, respectively; "pep8" runs the pep8 style compliance tool (which should only be done by developers); and "cover" runs the test suite under the default Python installation, but with coverage enabled. The coverage report generated by the "cover" environment is summarized in the HTML files present in the "cov_html" subdirectory. An example tox invocation:

tox -e py27,pep8

Adding and Configuring Turnstile

Turnstile is intended for use with PasteDeploy-style configuration files. It is a filter, and should be placed in an appropriate place in the WSGI pipeline such that the limit classes used with Turnstile can access the information necessary to make rate-limiting decisions. (With the turnstile.limits:Limit class provided by Turnstile, no additional information is required, as that class does not differentiate between users of your application.)

The filter section of the PasteDeploy configuration file will also need to contain enough information to allow Turnstile to access the Redis database. Other options may be configured from here as well, such as the enable configuration variable. The simplest example of a Turnstile configuration would be:

[filter:turnstile]
use = egg:turnstile#turnstile
redis.host = <your Redis database host name or IP>

The following are the recognized configuration options:

compactor.compactor_key
Specifies the sorted set that the compactor daemon uses for communication of buckets that need to be compacted. (See below for more information about the purpose of the compactor daemon.) This option defaults to "compactor".
compactor.compactor_lock
When multiple compactor daemons are being run, it is necessary to serialize their access to the sorted set specified by compactor.compactor_key. This option specifies a Redis key containing the lock, and it defaults to "compactor_lock".
compactor.compactor_timeout
If a compactor daemon (or its host) crashes while holding the lock, the lock will eventually time out, to allow other compactor daemons to run. This option specifies the timeout in seconds, and defaults to 30.
compactor.max_age
The bucket processing logic adds special "summarize" records to the bucket representation, to signal to other Turnstile instances that a request to summarize the bucket has been submitted. These records must age for a minimum amount of time, to ensure that all Turnstile instances have seen them, before the compactor daemon can run on the bucket. However, if the summarize request to the compactor daemon is lost, there must be a timeout, to ensure that a new request to summarize a given bucket may be submitted. This option specifies a maximum age for a "summarize" record, in seconds, and defaults to 600.
compactor.max_updates
The bucket processing logic adds special "summarize" records to the bucket representation, to signal to other Turnstile instances that a request to summarize the bucket has been submitted. These requests are generated when the number of update records in the bucket representation exceed the value specified by this configuration value. This option must be specified to enable the compaction logic; a good value would be 30.
compactor.min_age
The bucket processing logic adds special "summarize" records to the bucket representation, to signal to other Turnstile instances that a request to summarize the bucket has been submitted. These records must age for a minimum amount of time, to ensure that all Turnstile instances have seen them, before the compactor daemon can run on the bucket. This option specifies the minimum age for a "summarize" record, in seconds, and defaults to 30.
compactor.sleep
The compactor daemon reads bucket keys from a sorted set in the Redis database. If no keys are present, it will read from the sorted set again, in a loop. To ensure that the compactor daemon does not consume too much CPU time, after each read that returns no bucket to compact, it will sleep for the number of seconds defined by this option. The default is 5.
config

Allows specification of an alternate configuration file. This can be used to generate a single file which can be shared by WSGI servers using the Turnstile middleware and the various provided tools. This can also allow for separation of code-related options, such as the enable option, from pure configuration, such as the redis.host option. The configuration file is an INI-formatted file, with section names corresponding to the first segment of the configuration option name. That is, the redis.host option would be set as follows:

[redis]
host = <your Redis database host name or IP>

Configuration options which have no prefix are grouped under the [turnstile] section of the file, as follows:

[turnstile]
status = 404 Not Found

Note that specifying the config option in the [turnstile] section will have no effect; it is not possible to cause another configuration file to be included in this way.

control.channel
Specifies the channel that the control daemon listens on. (See below for more information about the purpose of the control daemon.) This option defaults to "control".
control.errors_channel
Specifies the channel that the control daemon (see below) reports errors to. This option defaults to "errors".
control.errors_key
Specifies the key of a set in the Redis database to which errors will be stored. This option defaults to "errors".
control.limits_key
The key under which the limits are stored in the database. See the section on tools for more information on how to load and dump the limits stored in the Redis database. This option defaults to "limits".
control.node_name
The name of the node. If provided, this option allows the specification of a recognizable name for the node. Currently, this node name is only reported when issuing a "ping" command to the control daemon (see below), and may be used to verify that all hosts responded to the ping.
control.reload_spread
When limits are changed in the database, a command is sent to the control daemon (see below) to cause the limits to be reloaded. As having all nodes hit the Redis database simultaneously may overload the database, this option, if set, allows the reload to be spread out randomly within a configured interval. This option should be set to the size of the desired interval, in seconds. If not set, limits will be reloaded immediately by all nodes.
control.remote
If set to "on", "yes", "true", or "1", Turnstile will connect to a remote control daemon (see the remote_daemon tool described below). This enables Turnstile to be compatible with WSGI servers which use multiple worker processes. Note that the configuration values control.remote.authkey, control.remote.host, and control.remote.port are required.
control.remote.authkey
Set to an authentication key, for use when control.remote is enabled. Must be the value used by the invocation of remote_daemon.
control.remote.host
Set to a host name or IP address, for use when control.remote is enabled. Must be the value used by the invocation of remote_daemon.
control.remote.port
Set to a port number, for use when control.remote is enabled. Must be the value used by the invocation of remote_daemon.
control.shard_hint
Can be used to set a sharding hint which will be provided to the listening thread of the control daemon (see below). This hint is not used by the default Redis Connection class.
enable

Contains a list of turnstile.preprocessor and turnstile.postprocessor entrypoint names. Each name is resolved into a preprocessor and postprocessor function (missing entrypoints are ignored) and installed, as with the preprocess and postprocess configuration options. Note that the postprocessors will be in the reverse ordering of the list contained in this option. See the section on entrypoints for more information.

Note that, if enable is used, preprocess and postprocess will be ignored.

formatter

In previous versions of Turnstile, the only way to change the way the delay response was generated was to subclass turnstile.middleware.TurnstileMiddleware and override the format_delay() method; this subclass could then be used by specifying it as the value of the turnstile option. This version now allows the formatter to be explicitly specified, using this option.

Searches for the formatter in the turnstile.formatter entrypoint group; see the section on entrypoints for more information.

postprocess

Contains a list of postprocessor functions. During each request, each postprocessor will be called in turn, with the middleware object (from which can be obtained the database handle, as well as the configuration) and the request environment as arguments. Note that any exceptions thrown by the postprocessors will not be caught, and request processing will be halted; this will likely result in a 500 error being returned to the user. Postprocessors are only run after processing all limits; most applications will not need to install a postprocessor.

Searches for the postprocessor in the turnstile.postprocessor entrypoint group; see the section on entrypoints for more information.

Note that, if enable is used, this option will be ignored.

preprocess

Contains a list of preprocessor functions. During each request, each preprocessor will be called in turn, with the middleware object (from which can be obtained the database handle, as well as the configuration) and the request environment as arguments. Note that any exceptions thrown by the preprocessors will not be caught, and request processing will be halted; this will likely result in a 500 error being returned to the user. Preprocessors are run before processing the limits.

Searches for the preprocessor in the turnstile.preprocessor entrypoint group; see the section on entrypoints for more information.

Note that, if enable is used, this option will be ignored.

redis.connection_pool

Identifies the connection pool class to use. If not provided, defaults to redis.ConnectionPool. This may be used to allow client-side sharding of the Redis database.

Searches for the connection pool class in the turnstile.connection_pool entrypoint group; see the section on entrypoints for more information.

redis.connection_pool.connection_class

Identifies the connection class to use. If not provided, the appropriate redis.Connection subclass for the configured connection is used (redis.Connection if redis.host is specified, else redis.UnixDomainSocketConnection).

Searches for the connection class in the turnstile.connection_class entrypoint group; see the section on entrypoints for more information.

redis.connection_pool.max_connections
Allows specification of the maximum number of connections to the Redis database. Optional.
redis.connection_pool.parser_class

Identifies the parser class to use. Optional. This is an advanced feature of the redis package used by Turnstile.

Searches for the parser class in the turnstile.parser_class entrypoint group; see the section on entrypoints for more information.

redis.connection_pool.*
Any other configuration value provided in the redis.connection_pool. hierarchy will be passed as keyword arguments to the configured connection pool class. Note that the values will be passed as strings.
redis.db
Identifies the specific sub-database of the Redis database to be used by Turnstile. If not provided, defaults to 0.
redis.host
Identifies the host name or IP address of the Redis database to connect to. Either redis.host or redis.unix_socket_path must be provided.
redis.password
If the Redis database has been configured to use a password, this option allows that password to be specified.
redis.port
Identifies the port the Redis database is listening on. If not provided, defaults to 6379.
redis.redis_client

Identifies a redis.StrictRedis subclass or analog, which will be used as the client library for communicating with the Redis database. This allows alternate clients which support clustering or sharding to be used by Turnstile.

Searches for the client class in the turnstile.redis_client entrypoint group; see the section on entrypoints for more information.

redis.socket_timeout
If provided, specifies an integer socket timeout for the Redis database connection.
redis.unix_socket_path
Names the UNIX socket on the local host for the local Redis database to connect to. Either redis.host or redis.unix_socket_path must be provided.
status
Contains the status code to return if rate limiting is tripped. This defaults to "413 Request Entity Too Large". Note that this value must start with the 3-digit HTTP code, followed by a space and the text corresponding to that status code. Also note that, regardless of the status code, Turnstile will include the Retry-After header in the response. (The value of the Retry-After header will be the integer number of seconds until the request can be retried.)
turnstile

If set, identifies an alternate class to use for the Turnstile middleware. This can be used in conjunction with subclassing turnstile.middleware:TurnstileMiddleware, which may be done to override how over-limit conditions are formatted.

Searches for the middleware class in the turnstile.middleware entrypoint group; see the section on entrypoints for more information.

This option is deprecated. To override the delay formatting function, use the formatter option.

Other configuration values are available to the preprocessors, the postprocessors, the delay formatters, and the turnstile.limits:Limit subclasses, but extreme care should be taken that such configurations remain in sync across the entire cluster.

Entrypoints

Turnstile takes many options which allow functions or classes to be specified, as indicated above. All of these options expect their values to be given in one of two forms. The first form, which was the only valid format for older versions of Turnstile, is the "module:name" format. However, Turnstile now has support for the pkg_resources "entrypoint" abstraction, which allows packages to define a set of entrypoints. Entrypoints are organized into groups, all having a similar interface; and each entrypoint has a given name. To use a function or class which has a declared entrypoint, simply use the name of that entrypoint. (Note that names are prohibited from containing colons, to distinguish between the two formats.)

The following entrypoint groups are recognized by Turnstile:

turnstile.command

The control daemon accepts commands from remote callers. One of these commands is the "reload" command, which causes Turnstile to reload the limits configuration from the Redis database. A second built-in command is the "ping" command, which can be used to ensure all Turnstile instances are receiving command messages. It is possible to create additional commands by associating the command string with a function under this entrypoint group. The function has the following signature:

def func(daemon, *args):
    pass

The first argument will be the actual control daemon (which could be either a turnstile.control.ControlDaemon or a turnstile.remote.RemoteControlDaemon); the remaining arguments are the arguments passed to the command. See the turnstile-command tool for a way to submit arbitrary commands of this form.

turnstile.connection_class
The default Redis database client uses either a redis.UnixDomainSocketConnection or a redis.Connection object to maintain the connection to the Redis database. The redis.connection_pool.connection_class configuration value allows this default to be overridden. Alternate classes will be searched for in this entrypoint group, if there is no colon (":") present in the configuration value. See the documentation for redis.Connection for details on this interface.
turnstile.connection_pool
The default Redis database client maintains connections in a pool, maintained as a redis.ConnectionPool object. The redis.connection_pool configuration value allows this default to be overridden. Alternate classes will be searched for in this entrypoint group, if there is no colon (":") present in the configuration value. See the documentation for redis.ConnectionPool for details on this interface.
turnstile.formatter

When the rate limiting logic determines that the request is rate-limited, Turnstile generates a response indicating that the REST client should try again after a certain delay. This response can be formatted in any desired way by using the formatter configuration option to specify an alternate function, which will be searched for under this entrypoint group. The formatter function has the following signature:

def formatter(status, delay, limit, bucket, environ, start_response):
    pass

The status is the configured status code for this Turnstile instance. The delay is a float value, specifying the length of the required delay in seconds. The limit and bucket values specify the actual underlying turnstile.limits.Limit and turnstile.limits.Bucket subclasses associated with that delay; alternate formatters can use the turnstile.limits.Limit.format() method to obtain a status and result entity specific for that limit. Finally, environ and start_response come from the WSGI pipeline; additional Turnstile configuration values can be retrieved from the turnstile.conf key in environ.

turnstile.limit
The setup_limits tool reads the limits configuration from an XML file. In that file, each limit has an associated limit class, specified by the "class" attribute of the <limit> element. When dumped using the dump_limits tool, this attribute will always be a "module:class" pair, but setup_limits recognizes short names, which will be searched for in this entrypoint group. See the documentation for turnstile.limits.Limit for details on this interface.
turnstile.middleware
Older versions of Turnstile allowed the formatter to be configured by subclassing turnstile.middleware.TurnstileMiddleware and overriding the format_delay() method. Although this is now deprecated, it is still possible, using the turnstile option in the configuration, to specify a subclass of TurnstileMiddleware that turnstile.middleware.turnstile_filter() should use. When no colon (":") is present in the turnstile configuration value, this is the entrypoint group that will be searched. See the documentation for TurnstileMiddleware for details on this interface.
turnstile.parser_class
The default Redis database client uses either a redis.connection.PythonParser or a redis.connection.HiredisParser object to parse the data stream from the Redis database. The redis.connection_pool.parser_class configuration value allows this default to be overridden. Alternate classes will be searched for in this entrypoint group, if there is no colon (":") present in the configuration value. See the documentation for redis.connection.PythonParser for details on this interface.
turnstile.postprocessor

Postprocessors run immediately after searching all the limits and verifying that the request should not be rate-limited. (They will not be run if the request is rate-limited.) They can be specified using either the postprocess or enable configuration options. The postprocessor function has the following signature:

def proc(middleware, environ)
    pass

The first argument is the actual middleware object, from which the configuration can be retrieved; the second argument is the WSGI environment.

turnstile.preprocessor

Preprocessors run immediately before searching all the limits. They can be specified using either the preprocess or enable configuration options. The preprocessor function has the following signature:

def proc(middleware, environ)
    pass

The first argument is the actual middleware object, from which the configuration can be retrieved; the second argument is the WSGI environment.

turnstile.redis_client
By default, Turnstile uses a redis.StrictRedis object to communicate with the Redis database. The redis.redis_client configuration value allows this default to be overridden. Alternate classes will be searched for in this entrypoint group, if there is no colon (":") present in the configuration value. See the documentation for redis.StrictRedis for details on this interface.

The Control Daemon

Turnstile stores the limits configuration in the Redis database, in addition to the ephemeral information used to check and enforce the rate limits. This makes it possible to change the limits dynamically from a single, central location. In order to facilitate such changes, each Turnstile instance uses an eventlet thread to run a "control daemon." The control daemon uses the publish/subscribe support provided by Redis to listen for commands, of which two are currently recognized: ping and reload.

Some WSGI servers cannot use Turnstile in this mode, due to using multiple processes (typically through use of the "multiprocessing" Python module). In these circumstances, the control daemon may be started in its own process (see the remote_daemon tool). Enabling this requires that the control.remote configuration option be turned on, and values provided for control.remote.authkey, control.remote.host, and control.remote.port. See the documentation for these options for more information.

It is possible to configure the listening thread of the control daemon to use alternate configuration for connecting to the Redis database. The defaults will be drawn from the [redis] section of the configuration, but by specifying redis.* options in the [control] section of the configuration, specific values may be overridden.

The Ping Command

The "ping" command is the simplest of the control daemon commands. In its simplest form, the message "ping:<channel>" is written to the control channel, which will cause all running Turnstile instances to return the message "pong" to the specified channel. If the control.node_name configuration option has been set, this node name will be included in the response, as "pong:<node name>". Finally, additional data (such as a timestamp) can be included in the "ping" command, as in the message "ping:<channel>:<timestamp>"; this data will be appended to the response, i.e., "pong:<node name>:<timestamp>". This could be used to verify that all nodes are responding and not too heavily loaded.

(Note that if control.node_name is not specified, the response to a "ping" command containing additional data such as a timestamp will be "pong::<timestamp>".)

Note that, at present, no tool exists for sending pings or receiving pongs.

The Reload Command

The "reload" command is the real reason for the existence of the control daemon. This command causes the current set of limits to be reloaded from the database and presented to the middleware for enforcement.

The simplest form of the reload command is simply, "reload". If the control.reload_spread configuration option was set, the reload will be scheduled for some time within the configured time interval; otherwise, it will be performed immediately.

The next simplest form of the reload command is "reload:immediate". This causes an immediate reload of the limits, overriding any configured time spread.

The final form of the reload command is "reload:spread:<interval>", where the "<interval>" specifies a time interval, in seconds, over which to spread reloading of the limits. This specified interval is used in preference to that specified by control.reload_spread, if set.

Note that the setup_limits tool automatically initiates a reload once the limits are updated in the database. See the section on tools for more information.

The Compactor Daemon

This version of Turnstile includes scalability enhancements which change how bucket data is stored in the Redis database. This eliminates the need for transactions--enabling various Redis clustering tools to be used--but at the cost of increased storage for the bucket data. Buckets are now stored as lists of records; each request processed by Turnstile results in the addition of an "update" record to the bucket representation. Then, to determine whether the request should be rate-limited, the bucket is reconstructed by applying all of the updates.

To prevent this list of records from growing without bound, the rate limiting logic includes a mechanism for triggering the compaction of a bucket--many of these update records are compacted into a single "bucket" record. This is triggered by setting a non-zero value for the compactor.max_updates configuration option. When the number of update records exceeds this threshold, a signal will be sent to the compactor daemon, which performs the actual compaction algorithm.

The compaction logic works by adding special "summarize" records to the bucket representation and placing the bucket's key into a special sorted set. The compactor daemon allows these entries in the sorted set to age for a given period of time (under control of compactor.min_age). Although no new summarize records will be added to the bucket representation if one is already present, there is the potential for multiple Turnstile instances to add one simultaneously; this aging allows all Turnstile instances to see that a summarize request is in progress.

Once a summarize request has aged sufficiently, the compactor daemon will perform the compaction and insert the resulting bucket back into the list representation. It then eliminates the now-extraneous update records.

If a summarize request is lost, due to a compactor daemon (or its host) crashing, the summarize records in the bucket representation have a maximum age as well; once the record exceeds its maximum age, a new summarize request will be generated.

Turnstile Tools

The limits are stored in the Redis database using a sorted set, and they are encoded using Msgpack. (Although the Msgpack format is not human-readable, it is very space and time efficient, which is why it was chosen for this application.) This makes manual management of the limits configuration more difficult, and so Turnstile ships with two tools to make management of the rate limiting configuration easier. A third tool starts up a remote control daemon, for use when Turnstile is used with applications that run multiple processes, such as the nova-api component of OpenStack.

The compactor_daemon Tool

The compactor_daemon tool may be used to start a compactor daemon process. This tool requires the name of an INI-style configuration file; see the section on configuring the tools below for more information.

A usage summary for compactor_daemon:

usage: compactor_daemon [-h] [--log-config LOGGING] [--debug] config

Run the compactor daemon.

positional arguments:
  config                Name of the configuration file.

optional arguments:
  -h, --help            show this help message and exit
  --log-config LOGGING, -l LOGGING
                        Specify a logging configuration file.
  --debug, -d           Run the tool in debug mode.

The dump_limits Tool

The dump_limits tool may be used to dump the current limits in the database into an XML representation. This tool requires the name of an INI-style configuration file; see the section on configuring the tools below for more information.

A usage summary for dump_limits:

usage: dump_limits [-h] [--debug] config limits_file

Dump the current limits from the Redis database.

positional arguments:
  config       Name of the configuration file, for connecting to the Redis
               database.
  limits_file  Name of the XML file that the limits will be dumped to.

optional arguments:
  -h, --help   show this help message and exit
  --debug, -d  Run the tool in debug mode.

The remote_daemon Tool

The remote_daemon tool may be used to start a separate control daemon process. This tool requires the name of an INI-style configuration file; see the section on configuring the tools below for more information. Note that, in addition to the required Redis configuration values, configuration values for the control.remote.authkey, control.remote.host, and control.remotes.port options must be provided.

A usage summary for remote_daemon:

usage: remote_daemon [-h] [--log-config LOGGING] [--debug] config

Run the external control daemon.

positional arguments:
  config                Name of the configuration file.

optional arguments:
  -h, --help            show this help message and exit
  --log-config LOGGING, -l LOGGING
                        Specify a logging configuration file.
  --debug, -d           Run the tool in debug mode.

The setup_limits Tool

The setup_limits tool may be used to read an XML file (such as that produced by dump_limits) and load the rate limiting configuration into the Redis database. This tool requires the name of an INI-style configuration file; see the section on configuring the tools below for more information.

A usage summary for setup_limits:

usage: setup_limits [-h] [--debug] [--dryrun] [--noreload]
                    [--reload-immediate] [--reload-spread SECS]
                    config limits_file

Set up or update limits in the Redis database.

positional arguments:
  config                Name of the configuration file, for connecting to the
                        Redis database.
  limits_file           Name of the XML file describing the limits to
                        configure.

optional arguments:
  -h, --help            show this help message and exit
  --debug, -d           Run the tool in debug mode.
  --dryrun, --dry_run, --dry-run, -n
                        Perform a dry run; inhibits loading data into the
                        database.
  --noreload, -R        Inhibit issuing a reload command.
  --reload-immediate, -r
                        Cause all nodes to immediately reload the limits
                        configuration.
  --reload-spread SECS, -s SECS
                        Cause all nodes to reload the limits configuration
                        over the specified number of seconds.

The turnstile_command Tool

The turnstile_command tool may be used to send arbitrary commands to all running control daemons. This tool requires the name of an INI-style configuration file; see the section on configuring the tools below for more information.

A usage summary for turnstile_command:

usage: turnstile_command [-h] [--listen CHANNEL] [--debug]
                         config command [arguments [arguments ...]]

Issue a command to all running control daemons.

positional arguments:
  config                Name of the configuration file.
  command               The command to execute. Note that 'ping' is handled
                        specially; in particular, the --listen parameter is
                        implied.
  arguments             The arguments to pass for the command. Note that the
                        colon character (':') cannot be used.

optional arguments:
  -h, --help            show this help message and exit
  --listen CHANNEL, -l CHANNEL
                        A channel to listen on for the command responses. Use
                        C-c (or your systems keyboard interrupt sequence) to
                        stop waiting for responses.
  --debug, -d           Run the tool in debug mode.

Configuring the Tools

All of the tools require an INI-style configuration file, which specifies how to connect to the Redis database. This file should contain the section "[redis]" and should be populated with the same "redis.*" options as the PasteDeploy configuration file, minus the "redis." prefix. For example:

[redis]
host = <your Redis database host name or IP>

Each "redis.*" option recognized by the Turnstile middleware is understood by the tools.

Additional options may be provided, such as the control channel, limits key, and the compactor_daemon and remote_daemon options. The configuration file should be compatible with the alternate configuration file described under the config configuration option for the Turnstile middleware.

Rate Limit XML

The XML file used for expressing rate limit configuration is relatively straightforward, or at least as straightforward as XML can be. The top-level element is <limits>; this should contain a sequence of <limit> elements, each containing a number of <attr> elements. The specific attributes available for any given limit class depend on the exact class, but that information is documented in the attrs attribute of the limit class. (This information is suitable for introspection.)

The <limit> element has one XML attribute which must be specified: the class attribute, which must identify the desired limit class. This value must be specified either as a "module:class" string, or a single name corresponding to a "turnstile.limit" entrypoint group. The <attr> element also has a single XML attribute which must be set: name, which identifies the name of the Limit attribute. The contents of the <attr> element identify the value for the named attribute.

Some limit attributes are lists; for these attributes, the <attr> element must contain one or more <value> elements, whose contents identify a single item in the attribute list. Other limit attributes are dictionaries; for these attributes, again the <attr> element must contain one or more <value> elements, but now those <value> elements must have the XML attribute key set to the dictionary key corresponding to that value.

As an example, consider the following limits configuration:

<?xml version='1.0' encoding='UTF-8'?>
<limits>
  <limit class="turnstile.limits:Limit">
    <attr name="requirements">
      <value key="pageid">[0-9]+</value>
    </attr>
    <attr name="unit">second</attr>
    <attr name="uri">/page/{pageid}</attr>
    <attr name="value">10</attr>
    <attr name="verbs">
      <value>GET</value>
    </attr>
  </limit>
</limits>

In this example, GET access to /page/{pageid} is rate-limited to 10 per second. The requirements attribute may be used to specify regular expressions to tune the matching of URI components; in this case, the {pageid} value must be composed of 1 or more digits. The limit class used is the basic turnstile.limits:Limit limit class.

Custom Limit Classes

All limit classes must descend from turnstile.limits:Limit. This admittedly un-Pythonic requirement has a number of advantages, including the specific machinery which allows limits to be stored into the Redis database. Most limit classes only need to worry about the attrs class attribute and the filter() method, although the route() and format() methods may also be hooked. For more information about these methods, see the docstrings provided for their default implementations in turnstile.limits:Limit.

Accessing the Turnstile Configuration

The Turnstile configuration is available to preprocessors and to the Limit classes. For preprocessors, it is available directly from the middleware object (the first passed parameter) via the config attribute. (The database handle is also available via the db attribute, should access to the database be required.) For the filter() method of the Limit classes, the configuration is available in the request environment under the turnstile.conf key.

The Turnstile configuration is represented as a turnstile.config:Config object. Configuration keys that do not contain a "." are available as attributes of this object; for example, to obtain the configured status value, assuming the Turnstile configuration is available in the conf variable, the correct code would be:

status = conf.status

For those configuration keys which do contain a ".", the part of the name to the left of the first "." becomes a dictionary key, and the remainder of the name will be a second key. For example, to access the value of the redis.connection_pool.connection_class variable, the correct code would be:

connection_class = config['redis']['connection_pool.connection_class']

All values in the configuration are stored as strings. Configuration values do not need to be pre-declared in any way; Turnstile ignores (but maintains) configuration values that it does not use, making these values available for use by preprocessors and Limit subclasses.

For convenience, the turnstile.config:Config class offers a static method to_bool() which can convert a string value to a boolean value. The strings "t", "true", "on", "y", and "yes" are all recognized as a boolean True value, as are numeric strings which evaluate to non-zero values. The strings "f", "false", "off", "n", and "no" are all recognized as a boolean False value, as are numeric strings which evaluate to zero values. Any other string value will cause to_bool() to raise a ValueError, unless the do_raise argument is given as False, in which case to_bool() will return a boolean False value.

Determining User Buckets

Some applications need to be able to inform the user of the next time they are able to make a call against a given URI, often as a part of listing the limits applying to that user. This entails access to the bucket data for that user. Under previous versions of Turnstile, this could only be accomplished by using the Redis "KEYS" command, which is most definitely not scalable. A new feature in Turnstile allows preprocessors to add the name of a sorted set in the WSGI environment variable turnstile.bucket_set; if this environment variable is set when a limit is processed, it will store the bucket key that was used into the named sorted set. The score used for this will be the expiration time for the bucket, which can be used to eliminate entries for buckets that have expired from the database.

Applications that have this requirement should implement both a preprocessor and a postprocessor; the preprocessor should set turnstile.bucket_set to an appropriate value, and the postprocessor should trim off the outdated entries from the named sorted set and load the buckets, performing whatever processing is necessary to make the data available to the application.

Backwards Compatibility and Interoperability

This version of Turnstile includes several enhancements, such as the addition of postprocessors and the enable configuration value. For the vast majority of these enhancements, backwards compatibility has been preserved; if you see an issue caused by lack of backwards compatibility, please log it as a bug.

There are, however, several features that have been deprecated in previous versions of Turnstile which are now removed; these are listed below:

  • The special treatment of the [connection] section of the configuration is removed; users should use the options in the [redis] and [control] sections.
  • The turnstile.config variable in the WSGI environment is removed; users should use the turnstile.conf variable instead.
  • The config property of the middleware object is removed; users should use the conf attribute instead.
  • The import_class() function of turnstile.utils is removed; users should use the find_entrypoint() function instead.
  • The TurnstileRedis class of turnstile.database is removed, along with its safe_update(), limit_update(), and command() methods. The latter two have been replaced by limit_update() and command() functions declared in the turnstile.database module. There is no replacement for safe_update().

The following features have been deprecated and will be removed in future versions of Turnstile:

  • Overriding the TurnstileMiddleware class with the turnstile configuration option is deprecated; users should use the formatter option to override delay formatting.
  • The decode() method of Limit classes is deprecated. Use the BucketKey class in turnstile.limits to decode bucket keys.
  • Except for the setup_limits tool's XML input file, the specification of functions and classes using "module:function" or "module:class" syntax is deprecated; Turnstile is moving to a pkg_resources entrypoint-based approach. See the section on entrypoints above for more information.

Interoperability with Older Versions of Turnstile

This version of Turnstile is not completely interoperable with older versions of Turnstile. Care has been taken to ensure that both new and old instances of Turnstile can run against the same database; however, the old versions cannot load bucket data from new versions and vice versa. Thus, users should only be running both versions during a transitional period; avoid running both versions for an extended period of time.

The bucket storage format has changed; the new format enhances Turnstile's scalability by eliminating the use of transactions when storing bucket data. To allow for a phased transition to a new version of Turnstile, the bucket keys have also changed. The result of this is that rate-limits are applied to users hitting the new version of Turnstile independently of those applied to users hitting the old version. This means that a user may be able to make twice as many requests as permitted by the rate limits. An expedited transition to the new version of Turnstile will address this problem.