/ledge

A Lua module for OpenResty, providing scriptable HTTP cache (edge) functionality for Nginx.

Primary LanguagePerl

Ledge

A Lua application for OpenResty, providing HTTP cache functionality for Nginx, using Redis as a cache / metadata store.

This offers Squid / Varnish like functionality and performance, directly within your Nginx server, coupled with the flexibility to script configuration dynamically.

Table of Contents

Status

Under active development, functionality may change without much notice. Please feel free to raise issues / request features at https://github.com/pintsized/ledge/issues.

Features

  • Cache items and metadata stored in Redis.
  • Configurable max memory limits for entities.
  • Redis automatic failover with Sentinel.
  • Event hooks to override cache policies at various stages using Lua script.
  • End-to-end revalidation (specific and unspecified).
  • Range requests (single and multipart).
  • Offline modes (bypass, avoid).
  • Stale-if-error (serves stale content on upstream error).
  • Serve stale content for an additional stale period.
  • Background revalidation (triggered by stale responses and upstream partial responses).
  • Collapsed forwarding (concurrent similar requests collapsed into a single upstream request).
  • Caching POST responses (serve-able to subsequent GET / HEAD requests).
  • PURGE requests to expire resources by URI (also supports wildcard patterns).
  • ESI 1.0 support. See documentation for exceptions.
  • Store gzipped responses and dynamically gunzip when Accept-Encoding: gzip is not present.

Installation

Download and install:

Review the lua-nginx-module documentation on how to run Lua code in Nginx. If you are new to OpenResty, it's important to take the time to do this properly, as the environment is quite specific. Note that LuaJIT must be enabled (which is the default).

Clone this repo, and the following dependencies into a path defined by lua_package_path:

Enable the lua_check_client_abort directive to avoid orphaned connections to both the origin and Redis, and ensure if_modified_since is set to Off.

Minimal configuration

A minimal configuration involves loading the module during init_by_lua, starting workers during init_worker_by_lua, configuring your upstream, and invoking Ledge during content_by_lua.

This requires that you have Redis running locally on the default port.

nginx {
    if_modified_since Off;
    lua_check_client_abort On;
    resolver 8.8.8.8;

    lua_package_path '/path/to/lua-resty-http/?.lua;/path/to/lua-resty-redis-connector/?.lua;/path/to/lua-resty-qless/?.lua;/path/to/lua-resty-cookie/?.lua;/path/to/ledge/?.lua;;';

    init_by_lua '
        local ledge_m = require "ledge.ledge"
        ledge = ledge_m.new()
        ledge:config_set("upstream_host", "HOST.EXAMPLE.COM")
    ';

    init_worker_by_lua 'ledge:run_workers()';

    server {
        location / {
            'ledge:run()';
        }
    }
}

Configuration options

Options can be specified globally during init_by_lua, or for a specific server/location during content_by_lua, before calling ledge:run().

Config set during content_by_lua will only affect that specific location, and runs in the context of the current running request. That is, you can write request-specific conditions which dynamically set configuration for matching requests.

origin_mode

syntax: ledge:config_set("origin_mode", ledge.ORIGIN_MODE_NORMAL | ledge.ORIGIN_MODE_BYPASS | ledge.ORIGIN_MODE_AVOID)

default: ledge.ORIGIN_MODE_NORMAL

Determines the overall behaviour for connecting to the origin. ORIGIN_MODE_NORMAL will assume the origin is up, and connect as necessary. ORIGIN_MODE_AVOID is similar to Squid's offline_mode, where any retained cache (expired or not) will be served rather than trying the origin, regardless of cache-control headers, but the origin will be tried if there is no cache to serve. ORIGIN_MODE_BYPASS is the same as AVOID, except if there is no cache to serve we send a 503 Service Unavailable status code to the client and never attempt an upstream connection.

upstream_connect_timeout

syntax: ledge:config_set("upstream_connect_timeout", 1000)

default: 500 (ms)

Maximum time to wait for an upstream connection (in milliseconds). If it is exceeded, we send a 503 status code, unless stale_if_error is configured.

upstream_read_timeout

syntax: ledge:config_set("upstream_read_timeout", 5000)

default: 5000 (ms)

Maximum time to wait for data on a connected upstream socket (in milliseconds). If it is exceeded, we send a 503 status code, unless stale_if_error is configured.

upstream_host

syntax: ledge:config_set("upstream_host", "web01.example.com")

default: empty (must be set)

Specifies the hostname or IP address of the upstream host. If a hostname is specified, you must configure the Nginx resolver somewhere, for example:

resolver 8.8.8.8;

upstream_port

syntax: ledge:config_set("upstream_port", 80)

default: 80

Specifies the port of the upstream host.

upstream_use_ssl

syntax: ledge:config_set("upstream_use_ssl", true)

default: false

Toggles the use of SSL on the upstream connection. Other upstream_ssl_* options will be ignored if this is not set to true.

upstream_ssl_server_name

syntax: ledge:config_set("upstream_ssl_server_name", "www.example.com")

default: nil

Specifies the SSL server name used for Server Name Indication (SNI). See sslhandshake for more information.

upstream_ssl_verify

syntax: ledge:config_set("upstream_ssl_verify", true)

default: false

Toggles SSL verification. See sslhandshake for more information.

use_resty_upstream

syntax: ledge:config_set("use_resty_upstream", true)

default: false

Toggles whether to use a preconfigured lua-resty-upstream instance (see below), instead of the above upstream_* options.

resty_upstream

syntax: ledge:config_set("resty_upstream", my_upstream)

default: nil

Specifies a preconfigured lua-resty-upstream instance to be used for all upstream connections. This provides upstream load balancing and active healthchecks.

buffer_size

syntax: ledge:config_set("buffer_size", 2^17)

default: 2^16 (64KB in bytes)

Specifies the internal buffer size (in bytes) used for data to be read/written/served. Upstream responses are read in chunks of this maximum size, preventing allocation of large amounts of memory in the event of receiving large files. Data is also stored internally as a list of chunks, and delivered to the Nginx output chain buffers in the same fashion.

The only exception is if ESI is configured, and Ledge has determined there are ESI instructions to process, and any of these instructions span a given chunk. In this case, buffers are concatenated until a complete instruction is found, and then ESI operates on this new buffer.

cache_max_memory

syntax: ledge:config_set("cache_max_memory", 4096)

default: 2048 (KB)

Specifies (in kilobytes) the maximum size a cache item can occupy before we give up attempting to store (and delete the entity).

Note that since entities are written and served as a list of buffers, when replacing an entity we create a new entity list and only delete the old one after existing read operations should have completed, marking the old entity for garbage collection.

As a result, it is possible for multiple entities for a given cache key to exist, each up to a maximum of cache_max_memory. However this should only every happen quite temporarily, the timing of which is configurable with minimum_old_entity_download_rate.

advertise_ledge

syntax: ledge:config_set("advertise_ledge", false)

default true

If set to false, disables advertising the software name and version eg (ledge/1.00) from the Via response header.

redis_database

syntax: ledge:config_set("redis_database", 1)

default: 0

Specifies the Redis database to use for cache data / metadata.

redis_qless_database

syntax: ledge:config_set("redis_qless_database", 2)

default: 1

Specifies the Redis database to use for lua-resty-qless jobs. These are background tasks such as garbage collection and revalidation, which are managed by Qless. It can be useful to keep these in a separate database, purely for namespace sanity.

redis_connect_timeout

syntax: ledge:config_set("redis_connect_timeout", 1000)

default: 500 (ms)

Maximum time to wait for a Redis connection (in milliseconds). If it is exceeded, we send a 503 status code, unless.

redis_read_timeout

syntax: ledge:config_set("redis_read_timeout", 5000)

default: 5000 (ms)

Maximum time to wait for data on a connected Redis socket (in milliseconds). If it is exceeded, we send a 503 status code.

redis_host

syntax: ledge:config_set("redis_host", { host = "127.0.0.1", port = 6380 })

default: { host = "127.0.0.1", port = 6379, password = nil, socket = nil }

Specifies the Redis host to connect to. If socket is specified then host and port are ignored. See the lua-resty-redis documentation for more details.

redis_use_sentinel

syntax: ledge:config_set("redis_use_sentinel", true)

default: false

Toggles the use of Redis Sentinel for Redis host discovery. If set to true, then redis_sentinels will override redis_host.

redis_sentinel_master_name

syntax: ledge:config_set("redis_sentinel_master_name", "master")

default: mymaster

Specifies the Redis Sentinel master name.

redis_sentinels

syntax: ledge:set_config("redis_sentinels", { { host = "127.0.0.1", port = 6381 }, { host = "127.0.0.1", port = 6382 }, { host = "127.0.0.1", port = 6383 }, }

default: nil

Specifies a list of Redis Sentinels to be tried in order. Once connected, Sentinel provides us with a master Redis node to connect to. If it cannot identify a master, or if the master node cannot be connected to, we ask Sentinel for a list of slaves to try. This normally happens when the master has gone down, but Sentinel has not yet promoted a slave. During this window, we optimistically try to connect to a slave for read-only operations, since cache-hits may still be served.

keep_cache_for

syntax: ledge:config_set("keep_cache_for", 86400 * 14)

default: 86400 * 30 (1 month in seconds)

Specifies how long to retain cache data past its expiry date. This allows us to serve stale cache in the event of upstream failure with stale_if_error or origin_mode settings.

Items will be evicted when under memory pressure provided you are using one of the Redis volatile eviction policies, so there should generally be no real need to lower this for space reasons.

Items at the extreme end of this (i.e. nearly a month old) are clearly very rarely requested, or more likely, have been removed at the origin.

minimum_old_entity_download_rate

syntax: ledge:config_set("minimum_old_entity_download_rate", 128)

default: 56 (kbps)

Clients reading slower than this who are also unfortunate enough to have started reading from an entity which has been replaced (due to another client causing a revalidation for example), may have their entity garbage collected before they finish, resulting in an incomplete resource being delivered.

Lowering this is fairer on slow clients, but widens the potential window for multiple old entities to stack up, which in turn could threaten Redis storage space and force evictions.

This design favours high availability (since there are no read-locks, we can serve cache from Redis slaves in the event of failure) on the assumption that the chances of this causing incomplete resources to be served are quite low.

max_stale

syntax: ledge:config_set("max_stale", 300)

default: nil

Specifies, in seconds, how far past expiry we can serve cached content. If a value is specified by the Cache-Control: max-stale=xx request header, then this setting is ignored, placing control in the client's hands.

This setting is useful for serving expensive content stale whilst revalidating in the background. For example, if some content has a TTL of one hour, you may wish to change this to 45 minutes, and allow stale serving for 15 minutes. Thus the cache item has the same effective TTL, but any requests in the last 15 minutes will be served quickly, and trigger a background revalidation for the latest version.

WARNING: Any setting other than nil may violate the HTTP specification (i.e. if the client does not override it with a valid request header value).

stale_if_error

syntax: ledge:config_set("stale_if_error", 86400)

default: nil

Specifies, in seconds, how far past expiry to serve stale cached content if the origin returns an error.

This can be overriden by the request using the stale-if-error Cache-Control extension.

cache_key_spec

syntax: ledge:config_set("cache_key_spec", { ngx.var.host, ngx.var.uri, ngx.var.args })

default: { ngx.var.scheme, ngx.var.host, ngx.var.uri, ngx.var.args }

Specifies the cache key format. This allows you to abstract certain items for great hit rates (at the expense of collisions), for example.

The default spec is:

{ ngx.var.scheme, ngx.var.host, ngx.var.uri, ngx.var.args }

Which will generate cache keys in Redis such as:

ledge:cache_obj:http:example.com:/about
ledge:cache_obj:http:example.com:/about:p=2&q=foo

If you're doing SSL termination at Nginx and your origin pages look the same for HTTPS and HTTP traffic, you could provide a cache key spec omitting ngx.var.scheme, to avoid splitting the cache when the content is identical.

enable_collapsed_forwarding

syntax: ledge:config_get("enable_collapsed_forwarding", true)

default: false

With collapsed forwarding enabled, Ledge will attempt to collapse concurrent origin requests for known (previously) cacheable resources into single upstream requests.

This is useful in reducing load at the origin if requests are expensive. The longer the origin request, the more useful this is, since the greater the chance of concurrent requests.

Ledge wont collapse requests for resources that it hasn't seen before and weren't cacheable last time. If the resource has become non-cacheable since the last request, the waiting requests will go to the origin themselves (having waited on the first request to find this out).

collapsed_forwarding_window

syntax: ledge:config_set("collapsed_forwarding_window", 30000)

default: 60000 (ms)

When collapsed forwarding is enabled, if a fatal error occurs during the origin request, the collapsed requests may never receive the response they are waiting for. This setting puts a limit on how long they will wait, and how long before new requests will decide to try the origin for themselves.

If this is set shorter than your origin takes to respond, then you may get more upstream requests than desired. Fatal errors (server reboot etc) may result in hanging connections for up to the maximum time set. Normal errors (such as upstream timeouts) work independently of this setting.

esi_enabled

syntax: ledge:config_set("esi_enabled", true)

default: false

Toggles ESI scanning and processing, though behaviour is also contingent upon esi_content_types and esi_surrogate_delegation settings, as well as Surrogate-Control / Surrogate-Capability headers.

ESI instructions are detected on the slow path (i.e. when fetching from the origin), so only instructions which are known to be present are processed on cache HITs.

All features documented in the ESI 1.0 Language Specification are supported, with the following exceptions:

  • <esi:inline> not implemented (or advertised as a capability).
  • No support for the onerror or alt attributes for <esi:include>. Instead, we "continue" on error by default.
  • <esi:try | attempt | except> not implemented.
  • The "dictionary (special)" substructure variable type for HTTP_USER_AGENT is not implemented.

esi_content_types

syntax: ledge:config_set("esi_content_types", { "text/html", "text/javascript" })

default: { text/html }

Specifies content types to perform ESI processing on. All other content types will not be considered for processing.

esi_surrogate_delegation

syntax: ledge:config_set("esi_surrogate_delegation", true)

default: false

ESI Surrogate Delegation allows for downstream intermediaries to advertise a capability to process ESI instructions nearer to the client. By setting this to true any downstream offering this will disable ESI processing in Ledge, delegating it downstream.

When set to a Lua table of IP address strings, delegation will only be allowed to this specific hosts. This may be important if ESI instructions contain sensitive data which must be removed.

esi_recursion_limit

syntax: ledge:config_set("esi_recursion_limit", 5)

default: 10

Limits fragment inlusion nesting, to avoid accidental infinite recursion.

esi_pre_include_callback

syntax: ledge:config_set("esi_pre_include_callback", function(req_params) ... end)

default: nil

A function provided here will be called each time the ESI parser goes to make an outbound HTTP request for a fragment. The request parameters are passed through and can be manipulated here, for example to modify request headers.

gunzip_enabled

syntax: ledge:config_set("gunzip_enabled", false)

default: true

With this enabled, gzipped responses will be uncompressed on the fly for clients that do not set Accept-Encoding: gzip. Note that if we receive a gzipped response for a resource containing ESI instructions, we gunzip whilst saving and store uncompressed, since we need to read the ESI instructions.

Also note that Range requests for gzipped content must be ignored - the full response will be returned.

keyspace_scan_count

syntax: ledge:config_set("keyspace_scan_count", 10000)

defautl: 1000

Tunes the behaviour of keyspace scans, which occur when sending a PURGE request with wildcard syntax. A higher number may be better if latency to Redis is high and the keyspace is large.

Workers

Ledge uses qless and the lua-resty-qless binding for scheduling background tasks, managed by Redis.

Currently, there is only one job type, which is the garbage collection job for replaced entities, and it is imperative that this runs.

run_workers

syntax: init_worker_by_lua 'ledge:run_workers(options)';

default options: { interval = 10, concurrency = 1 }

Starts the Ledge workers within each Nginx worker process. When no jobs are left to be processed, each worker will wait for interval before checking again.

You can have many worker "light threads" per worker process, by upping the concurrency. They will yield to each other when doing i/o.

The default options are quite conservative. You probably want to up the concurrency and lower the interval on busy systems.

Events

Events are broadcast at various stages, which can be listened for using Lua functions. A response table is passed through to your function, providing the opportunity to manipulate the response as needed.

For example, this may be useful if an upstream doesn't set optimal Cache-Control headers, and cannot be easily be modified itself.

Note that the response body itself is not available, since this is streamed at the point of serving.

Example:

ledge:bind("origin_fetched", function(res)
    -- Add some cache headers.  Ledge will assume they came from the origin.
    res.header["Cache-Control"] = "max-age=" .. 86400
    res.header["Last-Modified"] = ngx.http_time(ngx.time())
end)

Note that the creation of closures in Lua can be kinda expensive, so you may wish to put these functions in a module and pass them through.

bind

syntax: ledge:bind(event_name, callback)

Binds a user defined function to an event.

Event types

cache_accessed

syntax: ledge:bind("cache_accessed", function(res) -- end)

params: res The cached response table (does not include the body).

Fires directly fter the response was successfully loaded from cache.

origin_required

syntax: ledge:bind("origin_required", function() -- end)

params: nil

Fires when decided we need to request from the origin.

before_request

syntax: ledge:bind("before_request", function(req_params) -- end)

params: req_params. The table of request params about to send to the httpc:request method.

Fires when about to perform an origin request.

origin_fetched

syntax: ledge:bind("origin_fetched", function(res) -- end)

params: res. The response table (does not include the body).

Fires when the status/headers have been fetched, but before it is stored. Typically used to override cache headers before we decide what to do with this response. Note unlike before_save below, this fires for all fetched content, not just cacheable content.

before_save

syntax: ledge:bind("before_save", function(res) -- end)

params: res. The response table (does not include the body).

Fires when we're about to save the response.

response_ready

syntax: ledge:bind("response_ready", function(res) -- end)

params: res. The response table (does not include the body).

Fires when we're about to serve. Often used to modify downstream headers seperately to the ones used to determine proxy cacheability.

Protecting purge requests

Ledge will respond to requests using the (fake) HTTP method PURGE. If the resource exists it will be expired and Ledge will exit with 200 OK. If the resource doesn't exists, it will exit with 404 Not Found.

This is mostly useful for internal tools which expect to work with Squid, and you probably want to restrict usage in some way. You can acheive this with standard Nginx configuration.

limit_except GET POST PUT DELETE {
    allow   127.0.0.1;
    deny    all;
}

Logging

For cacheable responses, Ledge will add headers indicating the cache status. These can be added to your Nginx log file in the normal way.

X-Cache

This header follows the convention set by other HTTP cache servers. It indicates simply HIT or MISS and the host name in question, preserving upstream values when more than one cache server is in play. For example:

  • X-Cache: HIT from ledge.tld A cache hit, with no (known) cache layer upstream. `X-Cache: HIT
  • from ledge.tld, HIT from proxy.upstream.tldA cache hit, also hit upstream. X-Cache: MISS from
  • ledge.tld, HIT from proxy.upstream.tldA cache miss, but hit upstream. X-Cache: MISS from
  • ledge.tld, MISS from proxy.upstream.tld` Regenerated at the origin.

Author

James Hurst james@pintsized.co.uk

Licence

This module is licensed under the 2-clause BSD license.

Copyright (c) 2014, James Hurst james@pintsized.co.uk

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.