nvie/decoders

Some way to influence/adjust error messages to redact certain fields in objects?

Conduitry opened this issue · 3 comments

I have an object I'd like to run through Decoders, and I'd like to log the errors that it throws when there are validation issues - but there's one field I'd prefer not to appear in the logs as it contains sensitive data. Is there a nice way to achieve this - or, perhaps, a nice way to have control more generally over the constructed error messages?

As a workaround, I could delete the sensitive key from the object ahead of time, run the remaining object through Decoders, and then reinstate the sensitive key afterwards, but I'm wondering whether there's some nicer way to achieve this that I'm missing. Thanks!

Yeah, this is definitely something I have been wanting to support this for a while now, and with the recent foundational improvements this may be a bit easier to implement.

Short-term solution

For right now, one option you have is to not use the default formatter (formatInline), but to use formatShort instead. The difference is that using formatShort will not echo back the input at all, and just describe where the validation error happened within that object. It's a different experience, but it would solve your problem more immediately. See these docs: https://decoders.cc/#formatting-error-messsages

More robust actual solution

Ideas for a future release to further support this:

  1. Allow for a sensitive() decoder, which would allow you to write your object decoder like this:
    object({
      username: string,
      password: sensitive(string),
    })
    This is not really an actual decoder in the classic sense, but it would get recognized and treated specially by the object family of decoders, to ensure the default formatter would avoid echoing back content that should get masked. Imagine an error output like this:
    {
      "username": 123,
                  ^^^ Must be string
      "password": ************,
    }
    
  2. Also, maybe it's a good idea to let the default formatter (formatInline) recognize the most common field name patterns in practice and simply always treat those values as sensitive, i.e. field names containing password, passwd, token, etc. That may be more secure by default. Echoing back values for keys with those names is never a good idea anyway.

Maybe you have some ideas for how to shape those APIs yourself? Would love to hear them!

I haven't thought a lot yet about what I think an ideal API for this would be, but I will say the idea of adding certain keys to a redaction blacklist makes me uneasy, honestly. If someone wants to actually render one of those in the error message, does that require a separate API for re-enabling those, or are you just stuck with the redaction and there's nothing you can do about it? And where is the line drawn for what makes a 'common' field that often has sensitive data? Is that going to keep expanding if people open issues about 'hey, I printed sensitive field <x> into my logs and then my system got compromised'? And adding any fields to this sort of list would be a breaking change every time.

A sensitive() decoder seems nice, but the fact that that logic needs to live within the object() et al decoders feels a bit messy. You're not really able to take advantage of tree-shaking nicely in that case, because everyone would have to pay for the possibility that they might have a sensitive() within their object() whether they actually did or not. It also seems to mess with the API. Either sensitive() would need some secret backchannel way to convey its presence to object() etc. that is inaccessible to users, or else there would need to be a whole new mechanism for decoders to pass additional metadata around, which also doesn't feel great to me.

I'd probably learn towards three additional redacting variants of object()/exact()/inexact(), although I'm not sure what a nice API would look like for those.

"I haven't thought a lot yet about what I think an ideal API for this would be, but I will say the idea of adding certain keys to a redaction blacklist makes me uneasy, honestly. […] Is that going to keep expanding if people open issues"

Yep, this would be a massive downside, and I don't really like it either.

"A sensitive() decoder seems nice, but the fact that that logic needs to live within the object() et al decoders feels a bit messy. […] Either sensitive() would need some secret backchannel way to convey its presence to object() etc."

Edge cases pop up quickly once you start thinking about it.

For example, should this construct even be allowed or not? Does it even make sense?

either(sensitive(string), whatever)  // Can you compose this thing?

I don't think so. This example illustrates that sensitive() isn't really a Decoder itself in the strict sense, as it isn't freely composable. Instead, it serves as an "instruction" to object() et al to construct their error messages in a slightly different way than the default. Therefore it should only be allowed at the "top level" of an object definition.

object({
  username: string,
  password: nullable(sensitive(string)),  // ❌ Not allowed
  password: sensitive(nullable(string)),  // ✅ Allowed
});

Another API to convey the exact same information would be to add a second config param to object() et al:

object(
  {
    username: string,
    password: string,
  },
  { mask: ["password"] },  // Slightly less DRY alternative, because you have to repeat the field name here
);