swift-server/async-http-client

Privacy & security: Clarify what we log at which log level

weissi opened this issue · 5 comments

weissi commented

IIRC the initial design that we don't log anything private like URL, headers or even bytes at all. That might be a little strict though.

But it's very important to clarify what level we're potentially logging sensitive things at and if there's configuration to change so.

Right now it seems that we're logging the actual bytes of HTTP traffic without even documenting that. I think this needs to be clarified.

My personal opinion:

  • Nothing sensitive (i.e. no URLs/headers/bytes/...) logged at debug
  • URLs (but not headers/bytes) fine to log at trace

Very happy to change my opinion but this needs clarification.

Great that you bring this up. In my past work, I was very heavily involved in the GDPR discussions especially around logging PII data and URLs were a common thing that got missed. I do agree that we should not log any of that not even the header keys since they are also sometimes used to include PII information.

In general, I don't like the approach of tying this to a log level since that still can lead to PII information being logged when somebody is trying to debug a totally different part of an application. Another idea that I had was introducing a logging configuration to AHC similar to what we do in service-lifecycle. This would allow us to let users configure logging keys used by AHC and we can make the ones for headers/urls/etc stand out and optional to indicate that these may contain PII

weissi commented

Yes, doing something with the keys is also important. We definitely need to document what we do here (there's a logging design doc already in this repo) and make it such that a user can prevent PII from being logged at all.

This issue is of course affecting much more than just AHC but AHC should help. The only proper solution is to explicitly allow-list metadata keys in the LogHandler that can be logged and everything else probably needs to be scrambled/removed/hashed/...

I wonder if scrambling in a LogHandler or making logging keys configurable in every library is better. Like if every log key becomes optional with some sane default values that might work as well. I agree though we should document it here and also in the broader ecosystem

weissi commented

I wonder if scrambling in a LogHandler or making logging keys configurable in every library is better. Like if every log key becomes optional with some sane default values that might work as well. I agree though we should document it here and also in the broader ecosystem

Yeah, everything is kinda tricky. A library just can't know, sometimes URLs contain PII and sometimes they really don't. Even within the same app some part might be using AHC with sensitive URLs and another part might use AHC where logging URLs is completely benign.

Yeah, everything is kinda tricky. A library just can't know, sometimes URLs contain PII and sometimes they really don't. Even within the same app some part might be using AHC with sensitive URLs and another part might use AHC where logging URLs is completely benign.

Right, that's why I think a per client config that you can pass the logging keys might work here. Users are then able to either set the keys nor not depending on if they know that there is no PII in there.