zmartzone/lua-resty-openidc

Concurrent token refresh causes session to drop authentication.

Opened this issue · 10 comments

See also #190 which I think ultimately may have the same root cause.

lua-resty-openidc seems to assume that the locking mechanism in lua-resty-session keeps information stored in the session consistent when processing concurrent requests. It doesn't. I don't know what the design goals were for the locking mechanism implemented in lua-resty-session, but my conclusion after reviewing it is that either it needs to be fixed in lua-resty-session, which I believe is not possible without introducing breaking changes, or lua-resty-openidc should implement its own locking mechanism instead.

The effect of this is that concurrent updates to the session state may leave the session in bad shape. As the example below shows it's easy to reproduce deterministically.

Here's what's wrong with the locking mechanism:

The general procedure is

  • Call session:start()
  • Make changes to session.data (optional)
  • Call session:save() or session:close()

According to the documentation session:start() locks the session, which is true, however what actually happens inside session:start() is this:

  1. Lock session or wait
  2. Session locked
  3. Read data
  4. Unlock session
  5. Lock session or wait
  6. Session locked

For arguments sake, let's assume we're using redis storage engine in which case operations 1, 3, 4 and 5 will make calls to redis which are asynchronous, which means that they will yield to the current worker thread. With two concurrent requests the sequence of events will be

A B
Lock session or wait
Lock session or wait
Session locked
Read data
Unlock session
Session locked
Lock session or wait
Read data
Unlock session
Session locked
Lock session or wait

The problem when request B is allowed to proceed is that it will act on data which was read before it was updated by request A. Before discussing what this should look like, let's complete the sequence:

A B
Update session data
Save and unlock session
Session locked
Update session data
Save and unlock session

Here's what this should look like:

A B
Read data
Read data
Decide if update is needed (yes)
Lock session or wait
Decide if update is needed (yes)
Lock session or wait
Session locked
Re-read data
Decide if update is still needed (yes)
Update data
Save and unlock session
Session locked
Re-read data
Decide if update is still needed (no!)
Close and unlock session

Please note that the fix for #190 (PR #209) introduces a session regenerate step when refreshing the access token. While that seemingly fixes #190, it will be a problem here because both requests A and B will use the same session ID (from the cookie), but when request A is processed, the session will be destroyed, a new session ID will be generated and the cookie updated. When request B gets to the point where the session is to be updated, it is still working with the old session which has already been destroyed when processing request A.

Environment
  • lua-resty-openidc 1.7.2
  • lua-resty-session 2.2.6, 3.0, 3.4
  • Keycloak 9.0.0
Expected behaviour

Concurrent requests should result in a single token refresh and session should be in a valid authenticate state when all requests have been processed.

Actual behaviour

Multiple token refresh calls happen and session is left in a non-authenticated state when all requests have been processed.

Minimized example

The problem is exposed by the OpenResty configuration below. It uses a local KeyCloak instance set up to match the OpenResty configuration. For the sake of testing, set the access token lifefspan as short as possible (1 minute). Allow refresh tokens to be re-used several times.
With session storage set to use shm or redis storage adapter the problem is more severe than with cookie storage as explained below.

To run the sample do the following:

  1. Navigate to http://localhost:8880/login.
  2. Log in with KeyCloak.
  3. Your browser will be redirected to http://localhost:8880/ and show a page which is not protected by lua-resty-openidc but contains four <iframe> elements with protected content.
  4. Refresh the browser frame just to confirm that the session is persistent over a refresh.
  5. Wait for the access token to expire.
  6. Refresh the browser.

What you will see is that the first <iframe> shows the protected content as expected, but the others get a 401 response. From this point the login session has been lost until you go back to the login page http://localhost:8880/login.

If lua-resty-session is configured to use cookie storage there is no problem, unless KeyCloak is configured to allow a refresh token to be used only once. In this case the procedure above will show the same behavior, except that another browser refresh (with cookie storage that is) will bring the page back to fully operational (until next time the access token expires) without visiting the login page.

Configuration and NGINX server log files

This set-up mimics a reverse-proxying gateway responsible for authentication and authorization on port 8880 which is backed by a resource server on port 8888. The example uses a single OpenResty instance, but the resource server on port 8888 could also be a separate OpenResty instance, or something else.

nginx.conf (secrets garbled):

error_log logs/error.log debug;
events {
    worker_connections 1024;
}
http {
    lua_shared_dict sessions 1m;
    server {
        listen 8880;
        set $session_storage shm;
        location / {
            access_by_lua_block {
                local openidc = require 'resty.openidc'
                local opts = {
                    discovery = "http://127.0.0.1:8080/auth/realms/test/.well-known/openid-configuration",
                    client_id = "test",
                    client_secret = "{b424a988-9f5f-4c01-843e-34f78764ad12}",
                    redirect_uri = "http://localhost:8880/redirect_uri"
                }
                local is_login = ngx.var.request_uri == "/login"
                local unauth_action = nil
                if not is_login then unauth_action = 'deny' end
                local res, err, target_uri, session = openidc.authenticate(opts, nil, unauth_action)
                session:close()
                if not res then
                    ngx.status = 401
                    ngx.say(err)
                    ngx.exit(401)
                end
                if is_login then
                    ngx.header["Cache-Control"] = "no-cache, no-store, max-age=0"
                    return ngx.redirect("/frames")
                end
            }
            proxy_pass http://127.0.0.1:8888;
        }
        location /frames {
            proxy_pass http://127.0.0.1:8888;
        }
    }
    server {
        listen 8888;
        location /frames {
            default_type text/html;
            content_by_lua_block {
                ngx.header["Cache-Control"] = "no-cache, no-store, max-age=0"
                ngx.say('<html><body>')
                ngx.say('<iframe src="/hello/1"></iframe>')
                ngx.say('<iframe src="/hello/2"></iframe>')
                ngx.say('<iframe src="/hello/3"></iframe>')
                ngx.say('<iframe src="/hello/4"></iframe>')
                ngx.say('</body></html>')
            }
        }
        location /hello {
            default_type text/plain;
            content_by_lua_block {
                ngx.say('Hello #' .. ngx.var.request_uri:match("/hello/(.*)") .. '!')
            }
        }
    }
}

I'm experiencing this problem as well, and agree with your explanation.

I've implemented a rough workaround to allow only one URL to be in flight at any given time that has the potential to redirect. This is implemented with a simple flag in a shared dictionary that is true whenever a potentially redirectable URL is in flight. I set this flag to expire in 30 seconds to prevent permanent blockage of all authentication, so it's not perfect. (And I can expire the flag immediately if openidc.authenticate returns without a redirect).

I actually don't think it is feasible to use locking to solve this. If multiple redirects are being queued up, you can't guarantee that the first one will actually be returned. So you have to send them all off and handle the fact they might come out of order. It's going to take some thought, I think, to figure out how to best do that.

Actually, the refresh token reauths can be locked, it's the web redirects that can't.

I had to extend my workaround a bit further. I have two different locks—one for the web redirects and one for the silent redirects. The web redirects need the timed expiration for safety, but the silent reauths can just control the opts. renew_access_token_on_expiry flag so that only one in-flight request has that set to true at a time.

OK, so I've dialed in a locking method that is reasonably robust, that seeks to make sure there is never more than one openidc.authenticate action in flight at any given time. It seems to work well enough with silent redirects. I do need to

It does come at a cost that if multiple requests come in during the authenticate cycle, some will skip the authentication cycle. But in my application, that's not an issue.

I did find, however, that when these controls are working successfully, session:regenerate() actually hurts us. When the id_token needs to be refreshed, the regenerate call forces a new session, and any calls that are in flight using the older session lose all session data. Again, in my application I can afford to let this go, and a silent reauth will refill the session data soon enough, but in other applications that might not be the case.

It would be better, I think, if we did not regenerate, and handle the concurrent token refreshes in a different manner. Here's my thought. At least one of the affected lines is this:

https://github.com/zmartzone/lua-resty-openidc/blob/master/lib/resty/openidc.lua#L1067

This makes sure that the state recorded in the redirect URI matches the state in the cookie. I understand the CSRF concern makes this matching important. But what I would propose is that every time we generate a new session state, we push the old state onto a list, and allow a match against those as well. We can limit the size of this list to, say, 10, if there is a concern about eventually running out of memory. I think this also means removing states from this list here:

https://github.com/zmartzone/lua-resty-openidc/blob/master/lib/resty/openidc.lua#L1122

Are there any updates on this? My setup seems to be similar: I am using redis and unauth_action "deny" for the tested location.

I experience problems as soon as two ore more parallel requests occur while the access token needs to be refreshed. Promise.all([fetch(uri), fetch(uri)]) is really all it needs to reproduce. One request successfully refreshes the token, sets a new session cookie and ends with a success status (which is good). The other request(s) terminate with status 401 and unfortunately also creates a new (unauthorized) session. The browser continues with the last session cookie which is a unauthorized session.

How do other people deal with that? A workaround could be to strip the session-cookie from unauthorized-requests when using unauth_action "deny". But it feels wrong. Is there a better solution? (edit: this workaround seems to work as intended. "Unauthorized Requests" have to be retried by the app - which is ok in my case)

@hoebelix what will happen with parallel request with this mentioned bug fix? Won't this result in multiple token refreshs?

@thorstenfleischmann yes, this results in multiple token refreshs. But at least for my use case with server side sessions stored in redis this does not seem to be a problem.

Example: Say I have a session with id 'A' and there are two token refreshs in parallel. After that I have two renewed copies of this session with ids 'B' and 'C'. Only one of them is used by the browser for subsequent requests. The other "unused" session gets garbage collected by redis after some time.

@hoebelix and what is your timing strategy (ttl) for redis to remove your session. This is my problem, I have the same implementation but I don't want to leave a lot of "garbage" in redis for so long (days).

Any news about it?
I've a similar issue.
I'm integrating Ping IDP that in configured to allows the usage of refresh token only once.
I've got the flag renew_access_token_on_expiry=true.

In my Single Page application I've three different API calls performed concurrently.
when the token exipers the first API call process enter in the method openidc.authenticate and correctly refresh the token.
The second and the third receives a 400 response from refresh token endpoint and openidc.authenticate method return correctly an error, in my application the behaviour for Accept: application/json header request is to return a 401 status code.

So I'm receiving one correct response and two 401 response in my web application also if my token is correctly refreshed.
I'm struggling on this problem from several days, is there a way to avoid concurrent refresh token without overriding with custom code the resty openidc.lua module?

Thanks in advise for any help!!