dotnet/aspnetcore

Refreshing auth tokens for SignalR

analogrelay opened this issue ยท 42 comments

Below is one option we've considered, but I'm re-framing this issue to build some story for refreshing auth tokens.

To improve the ability to "refresh" expired tokens, we should consider caching the access token provided by the factory. Then, when an HTTP request gets a 401, we call the factory again before re-issuing the request. That way the user can configure a process to "refresh" the token without forcing the connection to be reestablished

  • For the WebSockets transport, this has no effect. There is only ever a single request. This logic would not cover reconnecting in the event of something like #1159 (where the WebSocket is terminated when the token expires)
  • For the SSE transport, this only affects POST (send) requests. We would call the token factory again and re-issue the send. The unsent data would stay buffered in the pipe
  • For Long Polling, this affects POST requests like SSE, and the GET (poll) requests. The client would assume that a 401 error indicates that the data is still in the pipe for them to read. The server would be expected to keep data in the pipe in the case of a 401

We need to make sure that if two requests are outstanding simultaneously, the access token factory is only called once. So we should use a shared component and lock properly

Also need to make sure any new new requests that need to get sent wait while the access token is refreshed.

The same behavior should happen while starting a connection. Requests that need to get sent with an access token should wait while a single call to the factory is made, and then get sent out once the token is ready.

I'm assuming the behavior between C# and TS should be the same.

@SteveSandersonMS @rynowak FYI, just something that popped up in our backlog grooming and @BrennanConroy mentioned this was something you cared about. So we put our special happy label on it.

Expanding this to also cover the possibility of in-band refresh of the token. We may want to build a way to refresh the user principal without terminating the connection.

Thanks for contacting us.
We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

We should author a doc about handling auth expiration correctly. See #5283 (comment) for a code example.

@bradygaster asked me to add @IEvangelist

Notes:
Add an option to enable automatic refresh of connection on token expiration (note: this closes the connection)

  • Store some sort of auth information that has expiration on it on the transport
  • Use IConnectionLifetimeNotificationFeature from heartbeat on expiration to tell SignalR to abort connection (with retry allowed)
  • If expiration of token happens, can we first run auth again to check if the token hasn't had its expiry updated?

Not checking the expiration of the access token in SignalR is a huge security risk. And it adds some work for the appilcation developer to make sure the access token is always fresh. I think it makes sense to add it to the SignalR library where the developer can opt in the feature #14578
I don't like the approach of closing the connection to then refresh the access token and then connect with the new access token. It does the job, but it's slow and hacky. There is also the possibility that the backend gets confused because of a completely new connectionId.

Furthermore I see a second problem arising, which is not solved by keeping the access token fresh. And that is changing roles within the access token.
For that let me give an example.

You have a priviliged user that manages other users in your application. The priviliged user then removes a role from one of the users he manages (e.g. removes the role of adding comment to you blogpost). Assuming this communication is via SignalR and the access token is always valid (not expired) then the user is still authorized to the the tasks (comment on your blogpost) he just was removed from. In the worst case, the access token was just refreshed and the user can abuse his roles up to EXPIRATION_TIME (e.g. 4 minutes).

This is a more complex problem to solve then just making sure that the token is always valid, because the Identity Provider needs to inform you that the roles of a user were updated.

To sum this up.
I see two problems with signalR and authorization. The first is that the access token is never looked at after the inital connection and second updating roles while a connection is live.
I think we should focus on the first one (keeping the access token fresh). I think it would already help a lot of developers when SignalR with websockets support this feature.

@BrennanConroy killing the connection in case of an expired refresh token sounds like a blunt hack to be honest. Everything that relates to the connection would be thrown away, while what we actually want to achieve is a "cache invalidation" on the side of the hub with the support of the client. Couldn't we think about it from the "bi-directional communication feature" perspective of SignalR itself?

  1. Hub informs clients some seconds (configurable) before the access token will become invalid. Either because of a timer or because of the heartbeat mechanism. The hub has to know about it, because it persists/caches the initial access token and relates it to the connection itself.
  2. Clients can react to this event, that is triggered by the hub to then re-send their access token (which should be refreshed by their local access token handling mechanism anyways) to allow the hub to update the authorization properties related to the client.
  3. The hub receives the updated token from the client and overwrites all related ClaimsPrincipal properties for the sending client and therefore updates its internally cached/connectation related state to reflect the latest acces token content.
  4. Because of this update, if we remove a role from a client (like described by @SebastianKunz f.e.), this change will be reflected with the next access token refresh cycle, which usually should take 5 minutes maximum.

Apologies, I realize I left out some critical information when jotting down notes. Closing the connection after token expiration is one part of what we plan on implementing. We do want to have a nicer experience where the token can be refreshed without closing the connection. That does require a lot more thought and care and will be additive with the expiration check.

The current plan is to add just the expiration part in 6.0 (opt in) and then build on top of that and design the experience for refresh for a future release.

6.0 work is done, there is a new option to close connections on auth expiration. See #32431 for info.

Backlogging for future work in 7.0

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

Thanks for contacting us.

We're moving this issue to the .NET 7 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

In addition to the security flaw mentioned before, there is also another problem. In high loaded signalr system disconnecting and reconnecting can lead to a lot of messages being lost. Disconnecting and reconnecting are expensive operations that should be avoided.

This issue is somehow addressed in many places. I guess that a lot of solutions, that are based on nsignalr, are with jwt authentication with refresh token.

I am testing CloseOnAuthenticationExpiration feature. It works. On my development machine with all resources free it takes 1 second to process all operations:
[2022-02-02T23:19:24.859Z] Information: Connection disconnected.
[2022-02-02T23:19:25.904Z] Information: WebSocket connected to....

A lot can happen in 1 second and on a high load system with a lot of users and messages can lead to messages lost.

Below are individual items to consider for refreshing auth tokens for SignalR:

  • Fix error message on client side for OnClose (.NET7):

  • Add info to CloseMessage that auth is expired and make it easy for clients to see that (strongly typed exception) and request a new token if required. This should include the error message and status code (401 for example).

  • Immediate easy โ„ข win for (.NET7):

  • Change AccessTokenFactory to only be called on startup and on 401/403 and retry the request, not for every single http request

  • Doc improvement (.NET7):

  • Add doc on how users can setup an endpoint to refresh their token on the server

  • Protocol change (.NET8):

  • Add new hub message type for "pre-close" that's effectively an auth challenge and has a grace period for a response before killing connection

  • Could make use of Hub message headers to flow auth info to server instead.

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

This is a popular issue, so I figured I'd shared some updates. The team is looking at ways to keep the make sure there's a new auth token that represents the updated user information. Since SignalR isn't a simple request-response based system, it's challenging and anything we do here will require protocol changes (that means new client and server changes).

There are 2 approaches being considered:

  1. A way to renew user information using the transport protocol (HTTP in the common case)
  2. A way to renew the user information in the hub protocol

Doing auth at the transport layer lets us reuse all of the ASP.NET Core based authentication handlers so it's attractive. Doing it over the transport means we need to encode the auth representation in the hub protocol and then have code on the server side that understands how to "unpack" the token (the equivalent of auth handlers but for SignalR specifically).

These are the forms of auth being looked at:

  • Cookie
  • JWT (generally bearer tokens)
  • Cert
  • API-Key

I'd be happy if we could control when the accessTokenFactory is called to retrieve new access token. For custom authentication I would like to send a new auth token on every hub method call for reasons too elaborate to go into detail here. Sure I could do it though method parameters but that's ugly especially when there is an elegant way to do it on initial connection. Then on the back end we could develop more unified ways to custom handle those tokens, update identity, and so on.

@Gruski SignalR can't get a new token, the issue is about how to force the accessTokenFactory to run more than once per connection establishment for long running connections.

Has been this addressed in. NET8? At the moment, when getting a new token with refresh token, I can only disconnect an connect with a new token to trigger all auth procedure in Signalr backend. Isn't this a big issue? Can't just implement client side function who will send some magic word with token and reauthenticate connection or change URL param?

Has been this addressed in. NET8? At the moment, when getting a new token with refresh token, I can only disconnect an connect with a new token to trigger all auth procedure in Signalr backend. Isn't this a big issue? Can't just implement client side function who will send some magic word with token and reauthenticate connection or change URL param?

I agree this would be required to implement in net8 i'm having the same issues

There are at least 3 issues ( opened on this issue. Any movement on this directions or at least feedback?

Misiu commented

This issue should be moved to .NET 9 Planning as .NET 8 will be released soon. Hopefully, it will get addressed in .NET 9

This issue should be moved to .NET 9 Planning as .NET 8 will be released soon. Hopefully, it will get addressed in .NET 9

As often as it was moved to the next iteration without any visible update on it makes me doubt that .NET 9 will be our saviour. It's been two years almost since I last used it and had to work around this issue. Now again, I really thought they might have fixed it by now. But looks like even ASPNET.Core and SignalR aren't important enough projects for MS to fix critical issues. Not to mention how you can overlook this issue in the first place. Sad.

As I am aware, there is even no clue how to solve this from architecture point of view

Is this issue's milestone moved to NET8 planning?

It's just to sad that signalR suffer from that issue and since months I hoped that they fix it but I guess it will never be fixed my impression is that all those Microsoft's projects are suffering from the same issues that the team behind it has not the resources to tackle all the bleeding wounds in their project and they just try to fix it but it's simply to much work for those people. That's my opinion based on the experience working at multiple .net Maui and asp.net core projects. Reported over 12 issues and followed a lot more of already reported issues that I as well encountered in those projects. And still a lot of the issues are still remaining present in the latest versions.

@mkArtakMSFT, guess you have some insider information. Are you going to tackle this issue?

Hopefully in .NET 9

@davidfowl , are there any new information regarding 2 approaches, have you talked about it?

Hopefully in .NET 9

That'd be great. SignalR itself is a great tool and usually works really well. But that issue is a huge deal breaker and always was. Is about time that this will be fixed.

Maybe the community can help with that.

Came across this stackoverflow workaround. It is working for us for our self hosted service. Is this a viable solution for now?

Came across this stackoverflow workaround. It is working for us for our self hosted service. Is this a viable solution for now?

Well, this implies to stop and reconnect... this is only solution for now....

@davidfowl , are there any new information regarding 2 approaches, have you talked about it?

Yes, we would go with an "over the transport" approach. Which means protocol changes to accommodate doing auth without a reconnect.

That's great. Is it already on road map?

Any news on this topic?

Misiu commented

So maybe .NET 10?

So maybe .NET 10?

At a minimum

Wow that's crazy... too bad there's not much good alternatives to SignalR as of now...

At a minimum

The earliest we'll see it is in the next version? How about a maximum? ๐Ÿ˜†

At a minimum

The earliest we'll see it is in the next version? How about a maximum? ๐Ÿ˜†

It's sadly the fact that Microsoft does not invest enough in the development department in comparison to how many technologies they are invented and maintaining.

Best course is to build the solution yourself or even better try to avoid such incompleted frameworks.