coreruleset/go-ftw

Race condition in phase 5

theseion opened this issue · 11 comments

Tests for rules in phase 5 are subject to a rare race condition, as exhibited in the following log for tests 980170-2 and 980170-3 (line breaks between rules added for readability):

[Sat Mar 18 16:47:21.474075 2023] [security2:error] [pid 193:tid 140523746522880] [client 172.18.0.1:39150] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "04709bb3-6b2b-477d-aa14-76a89363b2dd"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGVXqtKqlnATxdUEg7QAAANg"]
[Sat Mar 18 16:47:21.476378 2023] [security2:error] [pid 193:tid 140524082149120] [client 172.18.0.1:39164] [client 172.18.0.1] ModSecurity: Warning. Pattern match "(?:^([\\\\d.]+|\\\\[[\\\\da-f:]+\\\\]|[\\\\da-f:]+)(:[\\\\d]+)?$)" at REQUEST_HEADERS:Host. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-920-PROTOCOL-ENFORCEMENT.conf"] [line "761"] [id "920350"] [msg "Host header is a numeric IP address"] [data "127.0.0.1"] [severity "WARNING"] [ver "OWASP_CRS/4.0.0-rc1"] [tag "modsecurity"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-protocol"] [tag "paranoia-level/1"] [tag "OWASP_CRS"] [tag "capec/1000/210/272"] [tag "PCI/6.5.10"] [hostname "127.0.0.1"] [uri "/"] [unique_id "ZBXrGVXqtKqlnATxdUEg7gAAAMQ"]
[Sat Mar 18 16:47:21.480771 2023] [security2:error] [pid 193:tid 140524098930432] [client 172.18.0.1:39172] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "04709bb3-6b2b-477d-aa14-76a89363b2dd"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGVXqtKqlnATxdUEg7wAAAMM"]


[Sat Mar 18 16:47:21.483110 2023] [security2:error] [pid 35:tid 140523931117312] [client 172.18.0.1:39180] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "c50b1e8e-e8a5-46d1-afc8-bf82368390f4"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGcs3WOXFKxhOjQ9D-AAAAE0"]
[Sat Mar 18 16:47:21.483333 2023] [security2:error] [pid 193:tid 140524082149120] [client 172.18.0.1:39164] [client 172.18.0.1] ModSecurity: Warning. Unconditional match in SecAction. [file "/etc/modsecurity.d/owasp-crs/rules/RESPONSE-980-CORRELATION.conf"] [line "96"] [id "980170"] [msg "Anomaly Scores: (Inbound Scores: blocking=3, detection=3, per_pl=3-0-0-0, threshold=5) - (Outbound Scores: blocking=0, detection=0, per_pl=0-0-0-0, threshold=4) - (SQLI=0, XSS=0, RFI=0, LFI=0, RCE=0, PHPI=0, HTTP=0, SESS=0, COMBINED_SCORE=3)"] [ver "OWASP_CRS/4.0.0-rc1"] [tag "modsecurity"] [tag "reporting"] [hostname "127.0.0.1"] [uri "/"] [unique_id "ZBXrGVXqtKqlnATxdUEg7gAAAMQ"]
[Sat Mar 18 16:47:21.489998 2023] [security2:error] [pid 193:tid 140523830429440] [client 172.18.0.1:39206] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "c50b1e8e-e8a5-46d1-afc8-bf82368390f4"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGVXqtKqlnATxdUEg8AAAANM"]

Rule 920350 is triggered as expected for the first test but rule 980170 is not. However, looking at the unique_id field it becomes clear that rule 980170 did in fact trigger but was reported as part of the second rule (the msg field of rule 999999 denotes start and stop ID of a rule).

The logic for finding triggered rules in the log is:

  1. send test request to mark start
  2. run test
  3. send test request to mark end
  4. while end marker not in log
    1. send test request to flush log
  5. check test against lines between start and end markers

Apparently, the log entry for rule 980170 may under some circumstances be written to the log much later and, which is worse, independently of other log entries of the same phase (both 999999 and 980170 run in phase 5), otherwise it would have shown up before the start marker for test 980170-3.

fzipi commented

Is there an easy fix for this?

Not that I'm aware of. I didn't dive deeply though.

fzipi commented

One thing that I see in common there is the unique_id. Maybe we can use it?

Maybe... Does that work reliably?

This happened again and caused an automated test run to fail on the CRS repo the other day. So, this definitely remains an issue.

We can't use unique_id because we don't know how many lines to expect.

I have an idea how to solve this though. Unfortunately, it will require replacing the current mechanism (maybe with a transition phase). The issue we are facing is that a test request will run in a thread / worker that is different from the marker requests. Hence, the marker requests may complete while processing of the test request is still ongoing. The fix thus has to be to use the same thread / worker as the test request.

For this to work we will need an additional rule that logs in phase 5 of every test request. Then we know that we can look for this special log entry to find the end of the log lines for a given test request. Essentially, the end marker we currently use would be replaced with a different end marker. The start marker can still be used because there's no race condition in that case.

I'm pretty sure this will work, however, I'm not sure wether a drop action would skip phase 5. If that is the case, then we're pretty much screwed... @dune73?

Ping @dune73.

dune73 commented

Sorry, I overlooked this so far.

Unfortunately, phase 5 is indeed skipped when the drop action is used. I suspect this is because the idea with drop is to free the thread immediately from the connection. I wonder how garbage collection works in this case ...

Either way, it's an unfortunate situation. But then I would not invest too much into phase 5. The only rule we use in phase 5 is 980170 (CRSv4) and it's not a very important rule. I would probably be pragmatic and test at reporting level 0 where the rule is inactive.

I have been encountering this problem more and more frequently over the past week or so, for what it's worth. Several PRs have been marked as 'failing tests' in error.

Thanks. I'm working on a workaround.