open-policy-agent/opa

nd_builtin_cache format is problematic

tsandall opened this issue · 1 comments

The format of the nd_builtin_cache field in decision_logs makes it difficult and computationally expensive to mask sensitive values:

  -----------8<------------
  "nd_builtin_cache": {
    "http.send": {
      "[{\"method\":\"get\",\"url\":\"http://example.com\"}]": {
        "body": null,
  -----------8<------------

The cache currently represents each call as a key-value pair under the built-in name. The arguments to the call are stored in the key and the response is stored in the value. There are two problems with this format.

  1. If we ever wanted to include other information about the call (e.g., metrics) we would not have a place to put it.
  2. More importantly, if there are any sensitive values in the args (e.g., the HTTP authorization header) masking them requires relying on regexp, rebuilding the entire map, etc.

I think we should find a different format. A proposal:

nd_builtin_cache: {
   calls: {
     http.send: [
         {
            request: [
                {method: get, url: "...", headers: {authorization: "..."}}
            ]
            response: ...
         }
     ]
   }
}

Each builtin call would be keyed by the function name and then the value would be the unique set of calls made. Masking out values could look like this:

# mask out request authorization header and all response headers
suffixes_to_mask := ["request/0/headers/authorization", "response/headers"]
mask contains op if {
  some idx
  input.nd_builtin_cache.calls["http.send"][idx]
  some suffix in suffixes_to_mask
  op := {"op": "remove", "path": sprintf("/nd_builtin_cache/calls/http.send/%v/%v", [x, suffix])}
}

This format would give us room to add additional fields in the future if needed. Since folks are already using the nd_builtin_cache feature and masking values in it, I think we should deprecate the current field (and configuration) and introduce a new one.

I agree in general, but I'd propose we use more generic names:

    http.send: [
         {
            args: [
                {method: get, url: "...", headers: {authorization: "..."}}
            ]
            result: ...
         }
     ]

Let's do request -> args and response -> result. Since there are other non-deterministic builtins (e.g. rand.intn), for which request/response isn't a good description.