RS Revoking access token
jricher opened this issue ยท 14 comments
Should the RS be given the ability to revoke or otherwise manage access tokens?
Originally posted by @adeinega in ietf-wg-gnap/gnap-core-protocol#383 (comment)
I believe an RS has to have a way to revoke ATs because of security purposes say ATs were "exposed" somewhere by mistake or purpose. For example, through log files/events or in SCM (Source Code Management) commits, and so forth.
Furthermore, an RS may expose say the /rs/revoke endpoint so a client wouldn't need to revoke an AT on its own or do that as an additional security measure so that along with the AS, an RS will explicitly know that a client decided to revoke the AT and this particular AT isn't active/valid anymore.
Right now, it's nearly impossible to achieve as the client on an RS' side doesn't know the dynamic location of the mngt endpoint for each AT.
I strongly disagree with having token management functions at the RS such as /rs/revoke
exposed to the client. This is a gross violation of role separation. The RS is where the client :uses: the token, not where it :gets: or :manages: the token. To me this issue is about having additional functionality to allow an RS to signal things to the AS about how to manage the token at runtime. For all the exposures above, this reads more like an incident where the functionality would be taken by a person acting at the AS, not automatically by the system.
Also important to this: the RS will not have access to the keys used for token presentation, since those are held by the client, and so it can't do the same kind of management functions that a client could anyway.
Yep, sure no issues with that. The major point
I believe an RS has to have a way to revoke ATs because of security purposes say ATs were "exposed" somewhere by mistake or purpose. For example, through log files/events or in SCM (Source Code Management) commits, and so forth.
is still an actual one, right?
Yes, exposing a token management function at the AS to the RS is the topic of this issue, and it's a good question!
https://github.blog/2022-04-15-security-alert-stolen-oauth-user-tokens/ is just for the record.
so, the use-case is: when the RS has mishandled the AT (it was stolen/exposed) then the RS itself can request the revocation of the AT, instead of the client being notified and then acting upon that by revoking the AT. I think that this is actually useful.
Misuse of this ability of the RS should also be considered: "a compromised RS that is revoking all AT it accepts". ATs can have multiple audiences and revoking a token means access denied to other RSs too. This puts emphasis on separating properly the security domains and assigning audiences for tokens in a flexible way, so that a misbehaving RS is not able to prevent access to different systems.
A few points here, some potentially needing a separate issue:
- IMO we should not support access tokens with multiple audiences. If a group of RSes is sharing ATs, it is up to them to deal with AT security as a group.
- I'm not sure of the benefit of having revocation as an AS endpoint. The AT is still cached on the client, and the client is likely to reuse it, and so the RS (or RS group) needs to have a local way to revoke ATs - a local blocklist. Given this capability, what's the added value of "revoking" the AT on the AS?
- On the other hand, if the Client somehow detects a mishandled AT (in particular for bearer tokens), there should be a way for it to inform the RS that the AT is no longer trusted. So yes, a
/revoke
endpoint on the RS would make sense.
Multiple audiences in an access token is the decision of the AS. This is extremely common. The most I think we should have is a security consideration (in the RS draft) about the tradeoffs with this.
I strongly disagree with having token management endpoints on the RS, especially exposed to the client. If the client things the token is compromised, it can revoke it at the AS using the existing token management endpoint (https://www.ietf.org/archive/id/draft-ietf-gnap-core-protocol-08.html#name-revoking-the-access-token), and it's up to the AS to inform any RS's about the new state of the token. The same would be true of any change of state of the token itself. I don't see value in directly exposing this same endpoint to the RS but it's worth discussing (and it might end up being a different endpoint for this function anyway).
Furthermore, the chances of the client detecting a compromise are, in my opinion, laughably thin. If anything, the client will revoke an access token only proactively when the client itself is done with it.
Shared signaling between the AS and any downstream RS's is an interesting topic. In the wild so far we've mostly seen it be passive, with the RS introspecting the "current state" of the token and going from there. Mechanisms for the AS or RS to push state to each other is interesting, but certainly much more advanced than what I've at least seen deployed. Furthermore, in a lot of systems it's completely unnecessary because the AS and RS don't communicate over a network protocol but instead have a shared data store, or they use the token itself to carry all state (with all the tradeoffs that comes with).
I strongly disagree with having token management endpoints on the RS
I agree with this. The RS is not authoritative over the AT; the issuer of the token - the AS - is the only entity that can be trusted to provide the actual state of the token. By adding a revocation endpoint (or any token management capability) on the RS, we are essentially sharing the validity state of the AT across multiple entities. If we suppose that such an endpoint is in place, and the RS says that the AT is revoked, but the AS says that the AT is active, we have a very confusing conflict.
Sharing state over the network means we now have to deal with all problems of distributed systems. Especially when that state is critical for reasoning about the security properties of the token, this will quickly turn into a big mess.
Shared signaling between the AS and any downstream AS's is an interesting topic.
I am guessing this should be "between the AS and any downstream RS's".
To me communication between AS's is also a very interesting but different topic.
Mechanisms for the AS or RS to push state to each other is interesting
in this case, the AS would have a way to signal to the RS's that an AT has been revoked, and the RS's would in turn either keep this state for as long as the AT is not expired, or replace the cache entry for this AT with one that indicates that the AT is revoked/invalid/inactive. By itself, this sounds like an optimization of the passive introspection of the AT by the RS, but I think there's room to explore more communication patterns here.
Multiple audiences in an access token is the decision of the AS. This is extremely common. The most I think we should have is a security consideration (in the RS draft) about the tradeoffs with this.
Even if this is very common, we don't have to endorse it or go through hoops to enable it. ATs are cheap.
I strongly disagree with having token management endpoints on the RS, especially exposed to the client. If the client things the token is compromised, it can revoke it at the AS using the existing token management endpoint (https://www.ietf.org/archive/id/draft-ietf-gnap-core-protocol-08.html#name-revoking-the-access-token), and it's up to the AS to inform any RS's about the new state of the token. The same would be true of any change of state of the token itself. I don't see value in directly exposing this same endpoint to the RS but it's worth discussing (and it might end up being a different endpoint for this function anyway).
You are not addressing my concern about informing the RS. We don't have (in this draft or the RS draft) a mechanism to reliably inform the RS about revocation.
Furthermore, the chances of the client detecting a compromise are, in my opinion, laughably thin. If anything, the client will revoke an access token only proactively when the client itself is done with it.
I think revoking a token "when you're done with it" is a waste of time for the Client, the AS and for the spec writers. But compromise is sometimes detected (admittedly rarely), and we need to support revocation in a generic protocol because it's an important security guarantee, even if few people will use it. The AT can be compromised either on the Client or on the RS, so the revocation endpoint should be available to either of them.
Shared signaling between the AS and any downstream AS's is an interesting topic. In the wild so far we've mostly seen it be passive, with the RS introspecting the "current state" of the token and going from there. Mechanisms for the AS or RS to push state to each other is interesting, but certainly much more advanced than what I've at least seen deployed. Furthermore, in a lot of systems it's completely unnecessary because the AS and RS don't communicate over a network protocol but instead have a shared data store, or they use the token itself to carry all state (with all the tradeoffs that comes with).
Even if many AS/RS pairs are collocated, we have to support this signaling as long as the protocol assumes they are separate entities. I suppose the easiest way to do it is to add an optional TTL value to the AS's token introspection response.
What I never really liked about "multiple audiences in an access token" is that it allows RS1 to take an AT and call RS2. In some cases, that's totally fine but imagine, for a second a situation when RS1 and RS2 belong to different organizations. In other cases, at some point, you might discover that in the same organization it turns out to be completely undesirable to have but there are already RSs that have been abusing that for a while. What do we/you do then?
I didn't want to mix these things with the current issue. For this one, I would like just to have some sort of the revocation endpoint for 911 scenarios available for RSs, or maybe, some other ways around / explanation, etc.
I'm not sure what the issue with multiple audiences is. Audience restrictions are set based on a policy. If the policy says that RS1 and RS2 cannot share an AT, then a client will not get such a token. This is decided by the AS. The AS should encode these policies, know the topology and should control the communication channels within its security domain.
The RS can also have its own policies. After introspecting the token to get additional information that will help it make authorization decisions, it can deny access if its policy says that (any or certain) other audiences on the AT are not allowed.
In any case, I think this is my fault for mentioning multiple audiences on ATs. I put that out there as a potential issue that we may have to keep in mind. Ignoring the existing problems with OAuth2 and OIDC around how audience restrictions are requested and set on ATs, I just want to mention that there are more use cases with multiple audiences also involving token-exchange.
Having said all that, I think this is the wrong discussion to be having here. This issue is about the RS being able to revoke ATs, by communicating with the AS. And I think this makes sense; the RS should be able to inform the AS that an AT was exposed and that it should be revoked. It is useful functionality that would already be used.
There is a separate discussion about the RS exposing an endpoint, so that other entities (clients, ASs, RSs) can signal that an AT has been revoked.
How is the flow on the RS side supposed to look like? What we typically see in the OAuth2 space is that an RS gets an AT and then introspects it at the AS; then:
- if the token is already revoked (or other policies set by the AS should prevent the token from being used), the AS will respond with "AT is inactive". The RS should drop the request.
- if the token is good, the AS will respond with "AT is active and (maybe) here is some more information for authz decisions". At this point, from what I understand, there is an implicit assumption that this response is cached by the RS.
I think the behaviour that @yaronf is trying to fix, is the RS caching the AS's introspection response. Whether this is allowed and for how long the cache entry will live is on the RS to decide. I don't think the AS should ever dictate such a cache ttl; it will always be a guess.
From the protocol perspective this is a non-issue. The RS decided to optimize a call with the tradeoff of loosening its security. I don't think invalidation of the cache on the RS is a protocol concern. From a security perspective, the RS should be validating any AT before any use.
I am happy to be corrected and understand this more. I do feel I am missing some information to reason about this further. And I recognize that I am probably biased about the RS introspecting the AT.
btw, I think it's helpful to visualize all possibilities. Each connection probably deserves its own discussion. But, I think/hope we can agree that if a token is exposed, this fact should be communicated to the AS.
whether it makes sense for a row-entity to signal revocation of AT to a col-entity | AS | RS | RP |
---|---|---|---|
AS | yes (advanced) | no (what I currently call cache-invalidation) | no |
RS | yes | no (what I currently call cache-invalidation) | no |
RP | yes | no (what I currently call cache-invalidation) | no (no peer trust) |
I will sleep on it.
Maybe one addition to this topic that came to mind; on the proposition to allow the RS to inform the AS that a token should be revoked: what is happening is that the RS is taking care for the rest of the system.
The RS itself knows that the AT has been exposed. The RS has the choice to deny to use the AT in any future request. But it should still communicate this to the AS. By doing that, other entities that may use the AT will know (through the AS (and probably through introspection)) that they should not accept the token. And the AS itself will know that for itself (ie, thinking in terms of OIDC and the userinfo endpoint).
At this stage it seems unwise to define an additional API for RS-facing token revocation functions at the AS. There are a lot of hidden problems here that would need to be thoroughly discussed and addressed, like if the RS calls to revoke a token and the AS does not do this for whatever reason, what should the RS do? And if a rogue AS could revoke a token (or even just mistakenly) that could break an otherwise running client in a very unexpected way. All of these and more are tradeoffs and would need to be documented thoroughly. It also raises the question of what other changes an RS might be able to effect on an already-issued token (that was issued to another party). This is not a simple function. Furthermore, if security signaling is the driver, then other security signal frameworks are better suited for this as a general solution.
And finally, such an additional API could be defined by an extension in the future if there is implementation demand for an interoperable version of this.
With all of that, I recommend this issue be closed without changes and the editors will this unless there is significant call from the group to add this function at this time.