owasp-modsecurity/ModSecurity

Incorrect escaping in @rx operator with macro expansion

Closed this issue · 18 comments

When using macro expansion with @rx operator, escaping is incorrectly applied.

Example:
SecRule ARGS "@rx localhost|%{TX.OtherHosts}"
will lead to something like SecRule ARGS "@rx localhost\|myhost1\|myhost2"
This screws up all regexes, as regex special characters will be escaped, even outside the macro-expanded varaiables.

In case I want to test against %{TX.MyHost} that isn't a regex, I can use the @contain operator.

What happens to be the version of ModSecurity in question?

Sorry, 2 (trunk).
PS: It seems that I cannot tag the issue with the version number when creating it

If you fill the version number in the issue template, the automation script will tag accordingly.

Hi @marcstern ,

I'm not sure that what you are seeing is incorrect per se. When using string content like that, there isn't any way for the software to know that characters in the string are not to be treated as string literals. I.e. the code needs to make one assumption or the other.

In your case, I assume you are constructing the %{TX.OtherHosts} variable? In that case, you could perhaps consider constructing it with space delimiters and then use something like "@pm localhost %{TX.OtherHosts}"?

If that doesn't meet your needs, if you would like to describe your use case in more detail, I could probably suggest additional alternatives.

Hi @martinhsv
I agree about the assumption (one way or another) but ...
If I want to treat the string as a litteral, I use the operators that are meant for literals (pm, contains, ...). If I want to treat the string as a regex, I use rx. This is simple and covers all use cases.
If rx treats the string as a litteral, I have no option for using a regex with a macro.
Work-arounds like you proposed are not possible because the real situation is obviously much more complex than the trivial example I gave (imagine TX.OtherHosts=myhost1|.*[.]mydomain for instance).

Hi @marcstern ,

I agree that there is a limitation inherent to the current assumption that is made.

However, I suspect that there are situations where the opposite assumption would also create an undesired limitation. While you have been able to utilize 'pm' and 'contains' for cases where you want special characters to be treated as literals, that may not always be true for everyone. Perhaps a TX variable needs to populated from query argument content, then combined with some complex fixed pattern and then run as a regex -- but that original query arg content needs to be treated as literal.

An additional (possibly minor) consideration is additional risk of creating undesirable regex patterns. If a regex pattern is derived from a variable that is populated from untrusted input, there is added risk of inadvertently creating opportunity for ReDoS attacks.

On the whole, I'd be reluctant to recommend that we change the current (default) rx variable-expansion functionality unless there is indication from the community that non-escaping is greatly preferred by the community in general.

If we really have a problematic gap in available functionality, a better possibility might be to create a new configuration item that controls this (possibly with a matching ctl: equivalent). Perhaps something like this SecRegexVariableExpansionEscape (with a default of 'On').

It would be interesting to assess specific use cases, though, just to make sure that there aren't reasonable workarounds already available.

'SecRegexVariableExpansionEscape' is good for me (even default to off if we want to stay compatible).
Even a configure flag is OK for me.

If you can find a (generic) work-around for this, I would be happy:
SecRule ... "setvar:TX.OtherHosts=%{TX.OtherHosts}|host[.]company1[.]com"
SecRule ... "setvar:TX.OtherHosts=%{TX.OtherHosts}|.*[.]company2[.]com"
SecRule ARGS "@rx https?://(?:localhost|%{TX.OtherHosts})$" ...

I'll assume that your first two rules are conditional, such that you cannot simply put all of the options into your third rule ...

SecRule ARGS "@rx https?://(?:localhost|host\.company1\.com|.*\.company2\.com)$" ...

One alternative approach that you could consider is to structure your rules the other way around. First extract the host into a variable:

SecRule ARGS "@rx https?://(.*)$" "capture,setvar:TX.ExtractedHost=%{TX.1}...

And then (ignoring localhost for a moment) have your 1st and 2nd rules run afterwards.

SecRule TX:ExtractedHost "@Streq host.company.com" "chain ...
SecRule [whatever VARIABLES+OPERATOR you need to preserve from your original first rule] [actions formerly associated with third rule]

SecRule TX:ExtractedHost "@rx .company2.com$"" "chain ...
SecRule [whatever VARIABLES+OPERATOR you need to preserve from your original second rule] [actions formerly associated with third rule]

No way, because the hostnames are added dynamically (for instance when defining a new vhost) and we don't know even their number at configuration time.
Note that, for every work-around you'll find, I'll come back with another unsupported use case ;-)
We need to have some dynamic regexes sometimes ...

Hmm. Well, I'll leave this open for the time being as an enhancement request. But without one or more fully defined use cases where the need is clear, I'm not sure how highly it will get prioritized.

OK, one full use case:
I'm performing a whitelisting of URL (scheme:fqdn) that can be used inside an ARG.
Some URL are defined statically (no problems), some are dynamic.
Dynamic methods:

  1. Grabbing the allowed URL from a header coming from the back-end (we're in proxy mode). In this header, I can have a fqdn, a domain or a complex regex.
  2. Allowing all virtual hosts that were created (we register the name in a var dynamically through a macro).
  3. Allowing only test site inside a domain (grabbed from point 1) when we are in the test environment (i.e. "test-.*%{allowedDomains}").
  4. And several other ones dynamic methods, needing some kind of regex depending on their intent.

Another use case: whitelisting of characters in the URL.
We have macros allowing some characters based on the used framework (in some locations). These macros may be used at several places and must be combined to allow all the needed characters.
Example of application profile:

Use FrameworkSharepoint
...

Use AllowCharsInUrl "{}"

The macro FrameworkSharepoint allows () in URL on top of the ususal characters
At the end, we need to cumulate both directives to allow alphanumerics, () and {} in location y.
As the application maintainer, you don't know what's inside the macro FrameworkSharepoint (it's managed by another entity).
There's no way to support this feature (validateByteRange doesn't support extension either).

I took an example very easy to understand out of context, so this particular case could be solved by adding macro-expansion to validateByteRange but the same logic (incremental population of pattern) is needed for other uses where only regex can be used.

Just to add a little bit of extra information ...

There are actually two escaping subcases here:

  1. If an rx pattern contains both static content and macro expansion content (e.g. "@rx a(b|c)d%{TX.MyCustomVar}", in v2.9 regex special characters in both the the macro value and the static portion 'a(b|c)d' will have the special characters escaped. This was in the example in the original posting but since most of the discussion so far has focused on the variable value, I thought I'd call it out separately. I'm inclined to agree that adding escapes in the static portion is undesirable behaviour. It provides no advantage, since the static portion of the regex pattern can simply be composed with escape characters embedded in it if that is what the rule writer wants. I expect that this is non-intuitive behaviour and moreover actually has the effect of restricting the flexibility available to the rule writer.
  2. The escaping of characters within the value that the macro expands to (i.e. the content of TX.MyCustomVar: The preferred behaviour in this case is more debateable, but as it's been discussed above, I won't belabour it.

libmodsecurity does not do either of the above two subcases of escaping special characters (tested in v3.0.4). This is not particularly helpful, in the short term, to Apache users who cannot upgrade to ModSecurity v3, but it seemed worth hightlighting.

ModSecurity does have a quite a few ways to accomplish things, so it's likely that any particular detailed use case can be achieved with a workaround. The possibility of a workaround, of course, does not preclude an enhancement that might make things easier for rule writers.

Very good analysis @martinhsv.
I'm happy to see that V3 behaves the non-escaped way.

Any chance to have this merged in 2.9? Thanks

To update the second subcase as described here: #2357 (comment) ...

With PCRE it is possible to indicate that an entire section of a regex pattern should have normally-special characters treated as literal characters instead, using the \Q...\E encapsulation constructs.

This means that a rule writer who wants characters within the macro expansion of a variable to be treated literally can specify that in the rule composition. Using the example from the previous content, the rule writer could opt to specify: "@rx a(b|c)d%\Q{TX.MyCustomVar}\E"

In other words it would appear that the current ModSecurity v2 behaviour of automatically escaping special characters does not really provide even a debatable advantage.

By abandoning the auto-escaping in v2, the rule writer would have the flexibility to choose whichever treatment is desired under the circumstances.

Hi @marcstern ,

There is now a PR for this issue ( #2912 ), if you are interested in trying it out before it gets merged.

Closed via #2912