zalando/logbook

Extract client identifier from authorization token

whiskeysierra opened this issue · 24 comments

Detailed Description

Logbook should enrich requests with the subject from the JWT token in the Authorization header, if present.

Context

Having the id as part of the requests would make it way easier to identify clients which in turn helps when:

  • identifying unauthorized access issues
  • usage analysis

Possible Implementation

  • introduce the concept of an attribute
  • attributes are simple key-value pairs (tbd type of value)
  • a request/response can have multiple attributes
  • attributes should be derived/created from requests/responses before any filtering (e.g. obfuscation)
  • built-in attribute extractor for sub from JWT token
    • detect JWT tokens: Bearer prefix + 3x base64 data separated by dots
    • remove Bearer prefix
    • split at .
    • base64 decode payload, i.e. the second element
    • parse JSON
    • read properties in order and return the first one that is present
      • https://identity.zalando.com/managed-id (Zalando employee tokens)
      • sub
    • don't hard code priorities, but rather allow to configure a list of names, defaults to ["sub"]
  • extend JsonHttpLogFormatter to include attributes (tbd, top level? nested? name clashes?)

Employee Token

{
  "sub": "3b66d47c-d886-4c63-a0b9-9ec3cad7e848",
  "https://identity.zalando.com/realm": "users",
  "https://identity.zalando.com/token": "Bearer",
  "https://identity.zalando.com/managed-id": "wschoenborn",
  "azp": "ztoken",
  "https://identity.zalando.com/bp": "810d1d00-4312-43e5-bd31-d8373fdd24c7",
  "auth_time": 1540188140,
  "iss": "https://identity.zalando.com",
  "exp": 1541411248,
  "iat": 1541407638
}

Service Token

{
  "sub": "stups_sales-order-service",
  "https://identity.zalando.com/realm": "services",
  "https://identity.zalando.com/token": "Bearer",
  "azp": "stups_sales-order-service_389e4e16-0695-45df-9afd-d9be0ffab456",
  "https://identity.zalando.com/bp": "810d1d00-4312-43e5-bd31-d8373fdd24c7",
  "iss": "https://identity.zalando.com",
  "exp": 1541411315,
  "iat": 1541407705,
  "https://identity.zalando.com/privileges": [
    "com.zalando::loyalty_point_account.read_all"
  ]
}

Links

Your Environment

  • Version used: 1.11.1

This should probably be configurable and default to being disabled, the reason being that the subject often contains users email address, which you do not want to log for data protection reasons.

Good point!

See also #373

Alternative could be to integrate with spring security and log SecurityContextHolder.getContext().getAuthentication().getName()
to keep it auth-method agnostic.

@AlexanderYastrebov That requires a spring dependency.

We're extracting data from the JWT to the MDC already, but after validation of the token; would not want to log any of the content before that. So some kind of hook for validation would be desirable.

Which JWT parsers are you considering? See https://github.com/skjolber/java-jwt-benchmark for a few.

For Spring my experience is that it is limited how much information there is to be found about authorization at request-response-logging-time, since that depends on the implementation which is called later up the chain. For example within the same app some REST methods do not require authorization while others do, and this is enforced by @PreAuthorize and/or even Open Policy Agent rules within the RestControllers.

It is perhaps better to add @ControllerAdvice for AccessDeniedException and friends with extra logging (including dumping contents of JWT) if there is a violation.

but after validation of the token; would not want to log any of the content before that

Why not?

Which JWT parsers are you considering?

Tbh, my idea was to just do it by hand using the Java standard library (split + base64) and Jackson (json).

Well logging details about the JWT goes into the same boat as logging request bodies before authenticaiton/autorization check. But I guess as long as there is not misunderstandings, it is fair to log those details. We're logging JWT details in the MDC, my first impression was that there was potential for mixing up verified and unverified credentails, but technically those should live in seperate parts of the log statements.

I guess 'by hand' parsing is fair if there is no validation involved.

I guess 'by hand' parsing is fair if there is no validation involved.

I believe it's even beneficial to log client identifiers especially for unauthorized requests since that may give you an indiciator who to talk to (assuming no shady intention).

So what does the desired output from logging look like? I'm not so sure about these attributes, would it not be more simple to just transform the header presentation, i.e. in the HttpLogFormatter?

I'm not so sure about these attributes, would it not be more simple to just transform the header presentation, i.e. in the HttpLogFormatter?

We obfuscate the Authorization header, so in the formatter there wouldn't be any way to do that.

So what does the desired output from logging look like?

{
  "origin": "remote",
  "type": "request",
  "correlation": "2d66e4bc-9a0d-11e5-a84c-1f39510f0d6b",
  "protocol": "HTTP/1.1",
  "sender": "127.0.0.1",
  "method": "GET",
  "path": "http://example.org/test",
  "headers": {
    "Accept": ["application/json"],
    "Content-Type": ["text/plain"]
  },
  "attributes": {
    "subject": "willi.schoenborn@zalando.de"
  },
  "body": "Hello world!"
}

Looking at some of the headers we're using, a lot of them actually contain structured data, like

X-Shopify-Shop-Api-Call-Limit: 1/80
Strict-Transport-Security: max-age=7889238
Set-Cookie: BIGipServerpool_posten_api.x.com_7460=1622359.9345.0300; path=/; Httponly; Secure

i.e.

{
  "origin": "remote",
  "type": "request",
  "correlation": "2d66e4bc-9a0d-11e5-a84c-1f39510f0d6b",
  "protocol": "HTTP/1.1",
  "sender": "127.0.0.1",
  "method": "GET",
  "path": "http://example.org/test",
  "headers": {
    "Accept": ["application/json"],
    "Content-Type": ["text/plain"],
    "X-Shopify-Shop-Api-Call-Limit": {
       value: 1, 
       limit: 80
    }
  },
  "attributes": {
    "subject": "willi.schoenborn@zalando.de"
  },
  "body": "Hello world!"
}

Would it be possible to 'capture' some of that, possibly also transforming authorization to

{
  "origin": "remote",
  "type": "request",
  "correlation": "2d66e4bc-9a0d-11e5-a84c-1f39510f0d6b",
  "protocol": "HTTP/1.1",
  "sender": "127.0.0.1",
  "method": "GET",
  "path": "http://example.org/test",
  "headers": {
    "Accept": ["application/json"],
    "Content-Type": ["text/plain"],
    "Authorization": {
       "sub": "stups_sales-order-service", 
       "iss": "https://identity.zalando.com"
    }
  }
  "body": "Hello world!"
}

if so desired?

I guess transforming the Authorization header, rather than filtering it, would be not reveal confidential information. Also, was it ever considered to just remove (filter) the token signature instead of the whole value? At least that would prevent someone taking a token from the logs.

The subject is already confidential, see #381 (comment)

But that is (application-specific-) misuse of the Subject. Logging 'who did what' becomes impossible without the subject?

Also, was it ever considered to just remove (filter) the token signature instead of the whole value?

That sounded like a proposal to change the current behavior of obfuscating the Authorization header completely. It would, again by default, expose subjects which is not ideal.

Logging 'who did what' becomes impossible without the subject?

That's totally desired, but it should be opt-in, so users need to make a conscious decision whether to use it or not. I want to be secure by default.

Might be that I misinterpreted your second to last comment.

I agree opt-in and not changing current default behaviour is desired.

So for an opt-in solution, what do you think about a structured header output / transformation approach (like my comment with JSON example above)?

So for an opt-in solution, what do you think about a structured header output / transformation approach (like my comment with JSON example above)?

I believe that's an orthogonal concern and would deserve its own issue/discussion.

qvdk commented

Hi everybody,

I am looking for this feature. Is there a way to log the subject only?

Addressed by #1589.