/wafflex

Primary LanguageCOtherNOASSERTION

Wafflex

An Nginx module for rule-based request filtering, rate-limiting, and access control for many Nginx servers sharing rules and data. It's loosely inspired by iptables -- well, not syntactically, or schematically, or even conceptually. I guess iptables is more like a spiritual predecessor, and Wafflex is its own animal designed for doing stuff to HTTP requests rather than network packets.

Conceptual Overview

Rules are simple, non-nesting, non-Turing-complete decisions to conditionally perform some actions on an HTTP request. Basically, one rule is one if-then-else or switch statement. Rules are organized in rule lists.

Rules can make use of limiters, which are pre-configured limits applied to named counters used for rate-limiting and flag checks.

A rule list is a sequence of rules that keeps executing rules until the end of the list, unless a rule executes a final action. Rule lists are stored in phase tables.

A phase table specifies lists of rule lists that are run at different phases in the HTTP request-response cycle (after headers, after request completion, after redirect, etc.). Each request phase can have one or more rule lists, which are executed in-order.

Finally, all of the above is stored in a rule set.

Syntax

Rule Set

The rule-set is the root-level structure that stores the phase table, named rule lists, named rules and limits for limiters.

{ //rule set
  "limits": { 
    "limiterName1": /*limiter*/,
    "limiterName2": /*limiter*/, 
    /*...*/
  }, 
  "phases": /*phase table*/,
  "lists": {   //named lists
    listName1: /*rule list*/, 
    listName2: /*rule list*/, 
    /*...*/
  },
  "rules": {   //named rules
    "ruleName1": /*rule*/, 
    "ruleName2": /*rule*/,
    /*...*/
  }
}

A rule set MUST have a phase table, but MAY NOT have named rules, limiters and lists. The rule lists and named rules MUST be uniquely named and can be referenced by name in the phase table. Limiters MUST also be uniquely named.

Phase Table

The phase table keeps track of rule lists associated with HTTP request/response phases.

{
  "connect": [ /*rule-list*/, /*rule-list*/, /*...*/ ],
  "headers": [ /*rule-list*/, /*rule-list*/, /*...*/ ],
  "request": [ /*rule-list*/, /*rule-list*/, /*...*/ ]
}

The possible HTTP request phases are:

  • connect: After a TCP connection is established
  • tls-connect : After TLS handshake is performed and TLS session is established
  • headers : After a request's headers are processed
  • body-data : After part (or all) of a request's body is read
  • request : After the entire request is processed
  • proxy-response : After an upstream response is received
  • response-headers : After response headers are generated, but before they are sent to the client.
  • response-data : After response headers and body are generated, but before they are sent to the client.
  • response : After response headers and body are generated, but before they are sent to the client.

Note that initially, a subset of these phases will be implemented (request, response, and maybe a few more)

Rule lists can be defined inside the phase table, or they can be referenced by name.

Rule List

An ordered list of rules.

//short-form:
[ /*rule*/, /*rule*/, /*...*/ ]

//long-form
{
  "name": "my-rules", //optional globally unique list name
  "rules": [ /*rule*/, /*rule*/, /*...*/ ]
}

Lists declared in the rule set's lists hash have their names set to their hash keys:

{ //rule set
  "lists": {
    "myList": [ /*rules*/ ]
  },
  /*...*/
}

//this creates a named list:
{ // rule list
  "name": "myList",
  "rules": [ /*rules*/ ]
}

All list names MUST be unique to the rule set.

Lists could also be unordered, meaning rule execution order is up to the module. Unordered lists can be heavily optimized through adaptive reordering and partial condition evaluation trees. (This is an advanced execution engine feature, and will not be part of the MVP.)

{
  "rules": [ /*rule*/, /*rule*/, /*...*/ ],
  "unordered": true
}

Rules can be define inside rule lists, and named rules can be referenced by the name:

{ //rule list
  "rules": [ "ruleName1", "ruleName2", {/*rule definition*/} ]
}

Rule

Rules are defined in JSON.

//example rule
{
  "name": "ratelimit",
  "info": "This rule rate-limits by ip",

  "if": {"#limit-break" : {
    "name": "rate-limit",
    "key": "$request_real_ip"
  }},
  "then": [
    {"#tag": "slowdown"},
    {"#reject": {"status": 403, "body": "Slow down!!!"}}
  ]
}

Rules have 3 forms:

If-rule
{
  "if": /*condition*/
  "then": /*actions*/
  "else": /*actions*/       //optional
}

{
  "if-any": [ /*condition*/, /*condition*/, /*...*/ ],
  "then": /*actions*/
  "else": /*actions*/       //optional
}

{
  "if-all": [ /*condition*/, /*condition*/, /*...*/ ],
  "then": /*actions*/
  "else": /*actions*/       //optional
}

if-any executes until the first true condition (think of it as if(condition1 || condition2 || ...))

if-all executes until the first false condition (think of it as if(condition1 && condition2 && ...))

Switch-rule
{
  "switch": [
    [ /*condition*/, /*actions*/],
    [ /*condition*/, /*actions*/],
    /*...*/
  ]
}
Unconditional rule
{
  "do": /*actions*/  
}

Rules can also have additional properties, such as key, name, track-stats and logging which will be discussed later.

Actions

Actions are things done to a request, like adding or removing a header, redirecting to an upstream server, rejecting a request with a 400 status code, or closing the connection. An action MAY end the processing of rules for the current request; this is called a final action.

/*actions*/       [ /*action*/, /*action*/, /*...*/ ] || /*action*/
/*action*/        "#action-name" || {"#action-name": /*params*/}
/*params*/        { /*...*/ } || [ /*...*/ ] || string || number

Arrays of actions are executed in-order until the end of the array, even in the presence of one or more final actions. Strings in action parameters are interpolated.

//examples

//array of actions:
[ {"#tag": "processed"}, "#accept" ]
[ "#reject", {"#tag": "rejected"} ] //will be tagged 'rejected' even though '#reject' is a final action.

//single action
"#reject"
{"#reject": 404}

Conditions

Conditions are statements that evaluate to true or false. They have the same form as an action:

/*condition*/        "#condition-name" || {"#condition-name": /*arguments*/}
/*arguments*/        { /*...*/ } || [ /*...*/ ] || string || number

As with actions, strings in condition arguments are interpolated.

//single condition
"#true" //always true
{"#match": ["$realip_remote_addr", "127.0.0.1"]} // true if request client's ip is 127.0.0.1

Limiters

Limiters are used to track a rate, and whether that rate exceeds some threshold. Limiters are defined outside of rules, rule lists, and phase tables.

//limiter syntax
{
  "name": "limiter-name" //globally unique
  "info": "optional limiter description",
  "interval": /*time-interval*/,
  "limit": number //maximum value for that interval
  "sync-steps": integer (default 4) //optional
  "burst": "burst-limiter-name", //optional
  "burst-expire": /*time-interval*/ //optional
}

/*time-interval*/ number (seconds) || nginx-time-range ("10s" /*10 seconds*/ || "1h" /*1 hour*/ || "5d" /*5 days*/ etc)

All limiters must have names unique to the rule set.

Using Limiters

Limiters are used to perform rate limit checks on numeric counters associated with a key specified in a rule. This counter is incremented during the execution of rules, and decreases linearly at a rate of limit/interval until it reaches 0. The counter values are shared and synchronized between Nginx servers.

A Limiter is checked and the corresponding counter auto-incremented with the #limit-break condition.

{ //limit
  "name": "request-rate",
  "interval": 60, //seconds
  "limit": 100
}

{ //rule
  "if": {"#limit-break": {
    "name": "request-rate", //limiter name
    "key": "$request_real_ip", //key for current limiter rate value
    //increment: 1 //increment current limit counter value by 1 by default
  }},
  "then": "#reject"
}

Limiter counter keys

Each limiter has a limit and a counter which is stored for that limit at a specified key. Different-named limiters store their own counters at the same key:

{ //limit
  "name": "request-rate",
  "interval": 60, //seconds
  "limit": 100
}

{ //another limit
  "name": "weekly-allowance",
  "interval": '7d', //7 days
  "limit": 1000000 // a million requests a week
}

{ //rule
  "if-any": [
    {"#limit-break": {
      name: "request-rate", //limiter name
      key: "$request_real_ip", //key for current limiter rate value
    }},
    {"#limit-break": {
      name: "weekly-allowance",
      key: "$request_real_ip" // same key as above, but tracking a different limiter's counter
    }}
  ],
  "then": "#reject"
}

Because it is common to use the same key for multiple limiter counters, the default key can be specified at the rule level:

{ //limit
  "name": "request-rate",
  "interval": 60, //seconds
  "limit": 100
}

{ //another limit
  "name": "weekly-allowance",
  "interval": '7d', //7 days
  "limit": 1000000 // a million requests a week
}

{ //rule
  "key": "$request_real_ip", //default key for limiter counters

  "if-any": [
    {"#limit-break": "request-rate"}, //uses default rule-level key
    {"#limit-break": "weekly-allowance"} //same as above
  ],
  "then": "#reject"
}

By default, the #limit-break condition auto-increments the numeric value stored at key specified by key. To test a limiter without incrementing the counter, use #limit-check, or explicitly set the increment argument for #limit-break to 0. A limiter counter can also be incremented as an [action][#actions] with #limit-increment

Bursty limits

Two limiters can be combined to create a limiter with a burst rate:

{ //limit
  "name": "burstable-rate-limit",
  "interval": 60, //seconds
  "limit": 100,
  "burst": "burst-rate"
  "burst-expire": "1h"
}

{ //burst rate
  "name": "burst-limit",
  "interval": "1h",
  "limit": 100000 // burst at 100K/hour
}

{ // rule using the bursty limit
  if: {'#limit-break': "burstable-rate-limit"},
  then: "#reject"
}

Here, the burstable-rate-limit has a burst rate defined by burst-limit. This means burstable-rate-limit starts rate-limiting only after the burst-limit has been exceeded. In the above case, the burstable-rate-limit(100/hour) is active only after burst-limit (100K/hour) is reached. The burstable-rate-limit remains in effect as long as burst-limit is exceeded, or for burst-expire amount of time after the burst-limit is first exceeded -- whichever is longer.

Limiter counter sharing

Limiter counters can be shared between Nginx instances. This allows for distributed rate tracking among a collection of servers. By default, counters are shared every time the value increases by 25% of the limit.

sync-steps

The sync-steps value controls how often counters are shared between Nginx instances. Between 0 and limit, the counter value will be shared sync-steps times. Put another way, the counter is shared every time its value increases by limit/sync-steps. Some examples:

  • sync-steps: 0: counter value is never shared with other Nginx instances.
  • sync-steps: 1: counter value is shared when the limit is reached, doubled, tripled, etc.
  • sync-steps: 4: counter value is shared every time it is increased by limit/4, i.e. every 25% of the limit
  • sync-steps: 100: counter value is shared every time it increases by 1% or limit, or 100 times between 0 and limit, between limit and 2 * limit and so on.

Limiters as flags

Limiters with a limit of 1 can behave like shared boolean flags with an expiration time.

{//limit
  "name": "ip-ban",
  "limit": 1,
  "interval": "1d"
  //ban flag that stays active for 1 day
}

{ //ban the ip when the request has a Ban-Me: 1 header
  "if": {"#match": ["$http_ban_me", "1"]},
  "then": {"#flag": {"name": "ip-ban", "key": "$request_real_ip"}} //flag this guy as banned
}

{ //check if banned
  "if": {"#flag-check": {name: "ip-ban", key: "$request_real_ip"}},
  "then": "#reject" // you're banned, dude. go home.
}

#flag-check is the same as #limit-check, and #flag same as #limit-increment.

String Interpolation

Strings passed to conditions and actions are automatically interpolated using the Nginx string interpolation syntax, with Nginx request variables.

"static string"
"$http_host" //evaluates to the value of the Host header
"foo/bar/${request_real_ip}:$url" //compound interpolated string

There's some room for heavy optimization in reusing interpolated strings, especially when used as keys, or if they are going to be hashed. Optimizations may not be present in MVP.

Condition List

#match

match strings for equality

{"#match": [ "string1", "string2", /*...*/ ]} //must all be the same to be true

#match-regex

match a string by regular expression

{"#match-regex": [ "string", "/regular_expression.*/"]}

Note that if the regular expression has any interpolation variables in it, it cannot be optimized and will need to be recompiled on every request.

#limit-break

Increment limiter counter, and check if a limit has been broken (exceeded)

{"#limit-break": {
  "name": 'limit-name',
  "key": "limiter-counter-key",
  "increment": 1 //default value
}}

If key is absent, the rule's default key is used.

#limit-check

Same as #limit-break, but with increment: 0

#flag-check

Same as #limit-break. Useful for Limiters with limit: 1 that are being used as flags.

#tag-check

Checks if a tag has been set for the request by the #tag command.

{"#tag-check": "tag-name"}

#match-subrequest

Perform a non-blocking subrequest to an Nginx location, and compare the response to a set of matching conditions.
(NOT included for MVP)

{"#match-subrequest": [/*subrequest*/, /*match_conditions*/]}

/*subrequest*/
"/subrequest_path" //HTTP GET subrequest to this path

{ 
  "path": "/subrequest_path",
  "method": "GET" // default
  "forward-headers": true //default. use headers from incoming request,
  "set-headers": {"Header-Name": "$http_header_name", "Other-Header-Name": "foobar"}
}

/*match_conditions*/
"ok" // response code 200-299
number // match response code number
{
  "code": number || [ code1, code2, code3 ],
  "headers": {
    "Header-Name": "match-Header-Value"
  },
  "body": "match-body-value"
}

#true

Always true. Useful as a default switch condition.

#false

Always false. Might be useful when writing rules.

Action List

#reject

Reject a request. This is a final action, after which no more rules will be processed for this request.

//short-form
"#reject"

//long-form
{"#reject": {
  "status": number, //HTTP response status code, default 403
  "body": "response_body" //HTTP response body, optinal
}}

#accept

Accept a request. This is a final action, after which no more rules will be processed for this request.

//short form
"#accept"

#proxy-pass

Pass request to an upstream proxy. See the Nginx proxy_pass command.
This is a final action, after which no more rules will be processed for this request.

{'#proxy-pass': "http://hostname-or-upstream-name/path"}

#proxy-set-header

Set proxy header. See the Nginx proxy_set_header command.

//set one or many headers
{'#proxy-set-header': {
  "Proxy-Header-Name":"$http_header_name",
  "X-Real-IP": "$request_real_ip"
}}

#limit-increment

Increment limit counter

//short form (if rule-wide default key is provided)
{"#limit-increment": "limiter-name"}

//long-form
{"#limit-increment": {
  "name": "limiter-name",
  "key": "limit-counter-key", //can be omitted if rule-level key given
  "increment": 1 //default
}}

#flag

Increment flag (limiter with limit:1) by 1. Same as #limit-increment

//short form (if rule-wide default key is provided)
{"#flag": "flag-name"}

//long-form
{"#flag": {
  "name": "flag-name",
  "key": "flag-key", //can be omitted if rule-level key given
  "increment": 1 //default
}}

#limit-reset

Set limit counter back to 0.

//short form (if rule-wide default key is provided)
{"#limit-reset": "limit-name"}

//long-form
{"#limit-reset": {
  "name": 'limit-name',
  "key": "limit-counter-key" //can be omitted if rule-level key given
}}

#flag-reset

Set flag back to 0. Same as #limit-reset

#tag

Set a string tag for the request. which adds a RoF-Tag-<tagname>: 1 header to the request.

{"#tag": "tag-name"}

#tag-reset

Deletes the given string tag for the request. which also removes the RoF-Tag-<tagname>: 1 header from the request. If tag is already absent, does nothing.

{"#tag-reset": "tag-name"}

#subrequest

Perform a non-blocking subrequest to an Nginx location.
(NOT included for MVP)

{"#subrequest": "/subrequest_path"}

{"#subrequest": {
    "path": "/subrequest_path",
    "forward-headers": true //default. use headers from incoming request,
    "set-headers": {"Header-Name": "$http_header_name", "Other-Header-Name": "foobar"}
  }
}

Synchronization

Rule sets can be loaded from Redis, a file, or Consul on Nginx startup. (Consul may be omitted for MVP). During runtime, rule sets, lists, limits, and counters are stored on a Redis server or cluster. (Cluster support may be omitted for MVP). Rule sets, lists, limits, and rules can be created, updated, and deleted at runtime via a command-line tool which runs updating commands on the Redis server. In turn, Redis notifies all the Nginx servers, and data is updated as-needed.

Statistics and Benchmarking

Aggregate rule execution information will be available for logging through variables available upon request completion:

  • $rof_request_time : time to run all rules
  • $rof_request_rules : number of rules processed
  • $rof_request_rules_percent : percent of rules processed
  • $rof_request_final_rule : name/id of rule that either accepted or rejected the request (empty if neither)
  • $rof_request_final_list : name/id of list that contained the final rule (empty if none)
  • $rof_request_final_phase : name of phase in the phase table

Individual rule statistics are not collected by default, but can be enabled on specific rules with the track-stats property:

{
  "if": /*condition*/,
  "then": /*action*/
  "track-stats": true
}

With track-stats enabled, runtime statistics for the rule will be collected, including totals for number of times executed, and the number of timers a request was rejeced via the rule. This data can be accessed via an Nginx location akin to the stub_status setting.

Logging

Specific rules' processing sequence can be enabled with the log property:

{
  "if": /*condition*/,
  "then": /*action*/
  "log": true
}

This will log each step of the rule's execution to a separate Wafflex log using the standard Nginx error log format, with the text being presented in a human-readable format.

Additionally, the logging of requests matching some pattern may be enabled in a manner to be determined later. (Not part of MVC)