open-policy-agent/opa

WASM: Some builtins don't work with strings >64 characters as inputs

sandhose opened this issue · 4 comments

Short description

Some builtins, at least regex.find_all_string_submatch_n, don't work for input strings longer than 64 characters

Steps To Reproduce

repro.rego:

package repro

const_c64 := "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
const_c65 := "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

test.input_no_dollar_c64 { match_no_dollar(input.c64) }
test.input_no_dollar_c65 { match_no_dollar(input.c65) }
test.input_dollar_c64 { match_dollar(input.c64) }
test.input_dollar_c65 { match_dollar(input.c65) }
test.const_no_dollar_c64 { match_no_dollar(const_c64) }
test.const_no_dollar_c65 { match_no_dollar(const_c65) }
test.const_dollar_c64 { match_dollar(const_c64) }
test.const_dollar_c65 { match_dollar(const_c65) }

match_no_dollar(s) = m {
	[m] = regex.find_all_string_submatch_n("^.*", s, 1)
}

match_dollar(s) = m {
	[m] = regex.find_all_string_submatch_n("^.*$", s, 1)
}

input.json:

{
  "c64": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "c65": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
}

Running with -t rego

They all evaluate truthy:

$ opa eval --format pretty -t rego -d ./repro.rego 'data.repro.test' -i input.json 
[
  "const_dollar_c64",
  "const_dollar_c65",
  "const_no_dollar_c64",
  "const_no_dollar_c65",
  "input_dollar_c64",
  "input_dollar_c65",
  "input_no_dollar_c64",
  "input_no_dollar_c65"
]

Running with -t wasm

They all evaluate truthy:

$ opa eval --format pretty -t wasm -d ./repro.rego 'data.repro.test' -i input.json 
[
  "const_dollar_c64",
  "const_dollar_c65",
  "const_no_dollar_c64",
  "const_no_dollar_c65",
  "input_dollar_c64",
  "input_no_dollar_c64",
  "input_no_dollar_c65"
]

Notice how input_dollar_c65 is the only test evaluating as undefined.
So this issue is only when:

  • the string comes from the input
  • it is more than 64 characters long
  • the regex matches until the end of the string

Expected behavior

That it correctly evaluates, both in wasm and rego mode

Additional context

Tested both on the latest version and old version, as far back as v0.36.0

I tried to see if it affected any natively implemented builtin or is specific to the regexp one. I did this is a simplified repro rego:

package repro

c65 := count(input.c65)
c64 := count(input.c65)

and it turns out that the result of this is the same for -t wasm and -t rego:

{
  "c64": 63,
  "c65": 64
}

Surprise! The input data is one character short, it seems. But that only slightly varies the bug report, not in any material way.

Also, I guess you've already ruled out that it's a more general problem, since without the trailing $ the outcomes are as expected.

So I'll keep digging.

OK at this point, I'm not sure what works and what doesn't and what shouldn't 😅 -- We're passing UNANCHORED to re2 here and I guess that makes sense for submatches, but it might not square with anchored regexps fed into find_all_string_submatch_n.

So, despite this being somewhat buggy -- Where did the need arise for you to use an anchored RE with find_all_string_submatch_n?

Coming back to this! 🔍

☝️ PR is ready. Thanks again for helping uncover this ✨