rat
is a sound tool to detect exponential
Regular Expression Denial of Service (ReDoS) attacks.
Since the algoirthm is proved to be sound, the tool cannot raise false
negatives.
This means that if rat
determines that a regular expression is safe, it is
impossible for it to be exploited by an attacker.
Furthermore, rat
takes advantage of efficient data structures to be
particularly fast.
The paper is available here. The preprint PDF versione is here.
To compile the project you need opam installed, with a version of the OCaml compiler >= 4.13.0.
make deps # Install the dependencies.
make # Build the project.
make test # Run the tests.
make install # Install the executable.
To run rat
without installing it:
dune exec rat -- <args>
Run with --help
to print the help message.
If there are no arguments, the command runs the interactive interpreter.
Assume we have rat
installed on the machine.
The following command analyzes the regular expression (a|a)*b
:
rat --regex '(a|a)*b'
The output should be similar to:
~ ⟦🐁⟧ ~
Exponential ReDoS: true
Exploit string: {
prefix = ''
pump = 'a'
suffix = ''
}
Example: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Runtime(ms): 0.399
The analyzer reports that the regular expression is vulnerable to a ReDoS attack, and prints an exploit string. The runtime includes only the time to run the analysis, and does not take into account the time to create a printable exploit string. Since in order to do this we must run an exponential algorithm (to build the minimal DFA recognizing the attack language), this can take a lot of time. The implementation of this algorithm can be improved considerably, and building the exploit string is not a necessary part of the analysis.
It is possible to set the semantics of the matching with the option
--semantics
.
If you are not sure, use the default option, which is match
.
In this case, the analyzer assumes that the input regular expressions match
an input string even if just a prefix of the input string matches the regular
expression.
It is the default behaviour of matching engines, and corresponds to the
behaviour of the function match
in the re
module in Python.
If the fullmatch
semantics is used, then the analyzer assumes that matching
engines compute the language membership.
The fullmatch
semantics is the one used in the fullmatch
function in Python.
If you read the paper about rat
(available soon!) and you are interested in
trying the examples, you should use --semantics fullmatch
, in all other
cases the match
semantics is more appropriate.
It is possible to print the full attack language for a regular expression
with the option --show-lang
.
Since this language is the specification of the attack words, it becomes quickly
difficult to read it.
rat
cannot analyze a wide variaty of non-regular constructs in regular
expressions, most notably backreferences and lookarounds.
Some non-regular constructs, such as $
, are supported in a limited form.
Future improvements might fix this.