๐ Bug Report: Refreshing OAuth token... forever and stop working
freedit-dev opened this issue ยท 32 comments
It occurs several times that my instance stop working and the last log is always
INFO redlib::oauth > [โ] 86279s Elapsed! Refreshing OAuth token...
And restarting redlib always fixed it.
Maybe there should be timeout for refreshing token.
Thanks for your great work.
Weird. The refresh code is literally just logging in again, which is what happens at launch.
Lines 102 to 108 in 6c2579c
I suppose it could have something to do with the header cache not being cleared.
Can you reproduce this consistently? If so, can you try running off of this branch? https://github.com/redlib-org/redlib/tree/try_fix_oauth_refresh
Not sure if this is the same issue as #81
The latest commit(fix_oauth_ratelimit)
The instance stops responding(no error log) and the last log:
Jun 27 01:41:19 base redlib[4038062]: ERROR redlib::utils > Error page rendered: Reddit error 400 "null": "Bad Request"
Jun 27 01:41:57 base redlib[4038062]: ERROR redlib::utils > Error page rendered: Reddit error 400 "null": "Bad Request"
Jun 27 01:43:33 base redlib[4038062]: WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
Jun 27 01:43:33 base redlib[4038062]: INFO redlib::oauth > [โ
] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399
Jun 27 01:43:33 base redlib[4038062]: INFO redlib::oauth > Refreshing OAuth token... success
Jun 27 01:43:33 base redlib[4038062]: WARN redlib::client > Rate limit 8 is low. Spawning force_refresh_token()
Jun 27 01:43:33 base redlib[4038062]: INFO redlib::oauth > [โ
] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399
Jun 27 01:43:33 base redlib[4038062]: INFO redlib::oauth > Refreshing OAuth token... success
Jun 27 01:47:35 base redlib[4038062]: WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
Jun 27 01:47:35 base redlib[4038062]: WARN redlib::client > Rate limit 8 is low. Spawning force_refresh_token()
I think it needs timeout for token retrieving.
And restarting fixed it.
Could be an issue with the atomic/concurrent nature of the token retrieval. Would appreciate if you could test after #158 to see if it has any difference
@sigaloid hang again
Jun 27 02:39:12 base redlib[4049503]: TRACE redlib::client > Ratelimit remaining: 11.0
Jun 27 02:39:12 base redlib[4049503]: TRACE redlib::client > Ratelimit used: 89
Jun 27 02:39:12 base redlib[4049503]: TRACE redlib::client > Ratelimit reset: 48
Jun 27 02:39:14 base redlib[4049503]: TRACE redlib::client > Ratelimit remaining: 10.0
Jun 27 02:39:14 base redlib[4049503]: TRACE redlib::client > Ratelimit used: 90
Jun 27 02:39:14 base redlib[4049503]: TRACE redlib::client > Ratelimit reset: 45
Jun 27 02:39:18 base redlib[4049503]: TRACE redlib::client > Ratelimit remaining: 9.0
Jun 27 02:39:18 base redlib[4049503]: TRACE redlib::client > Ratelimit used: 91
Jun 27 02:39:18 base redlib[4049503]: TRACE redlib::client > Ratelimit reset: 41
Jun 27 02:39:18 base redlib[4049503]: WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
Jun 27 02:39:18 base redlib[4049503]: TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
Maybe you should not use async rwlock, using std rwlock instead.
I don't think it's a synchronization issue. It syncs on mine when I simulate high load.
I don't know what exactly the problem is - you're saying it's hanging precisely after the line "TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99"? Nothing else?
If so, it sounds like you cannot retrieve new tokens for some reason, by method of an automatic refresh. Which makes no sense - it's generating one every time you start it, and a refresh should be no different than restarting the program.
here's an example of my log output:
TRACE redlib::client > Ratelimit remaining: 9.0
TRACE redlib::client > Ratelimit used: 91
TRACE redlib::client > Ratelimit reset: 103
WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
INFO redlib::oauth > [โ
] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399
INFO redlib::oauth > Refreshing OAuth token... success
TRACE redlib::client > Ratelimit remaining: 8.0
TRACE redlib::client > Ratelimit used: 92
TRACE redlib::client > Ratelimit reset: 102
TRACE redlib::client > Ratelimit remaining: 5.0
TRACE redlib::client > Ratelimit used: 95
TRACE redlib::client > Ratelimit reset: 102
TRACE redlib::client > Ratelimit remaining: 7.0
TRACE redlib::client > Ratelimit used: 93
TRACE redlib::client > Ratelimit reset: 102
TRACE redlib::client > Ratelimit remaining: 6.0
TRACE redlib::client > Ratelimit used: 94
TRACE redlib::client > Ratelimit reset: 102
WARN redlib::client > Rate limit 6 is low. Spawning force_refresh_token()
TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
INFO redlib::oauth > [โ
] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399
INFO redlib::oauth > Refreshing OAuth token... success
TRACE redlib::client > Ratelimit remaining: 3.0
Besides the occasional fact that a response may take longer to retrieve and the really small rate limits right before a refresh might accidentally overwrite the rate limit, my refreshing works just fine.
Yes, it's occasional. And it hang exactly TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
, nothing else. And restarting always fixed it.
Jun 27 03:36:31 base redlib[4054219]: WARN redlib::client > Rate limit 8 is low. Spawning force_refresh_token()
Jun 27 03:36:31 base redlib[4054219]: TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
Jun 27 03:36:31 base redlib[4054219]: INFO redlib::oauth > [โ
] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399
Jun 27 03:36:31 base redlib[4054219]: INFO redlib::oauth > Refreshing OAuth token... success
Jun 27 03:36:31 base redlib[4054219]: TRACE redlib::client > Ratelimit remaining: 7.0
Jun 27 03:36:31 base redlib[4054219]: TRACE redlib::client > Ratelimit used: 93
Jun 27 03:36:31 base redlib[4054219]: TRACE redlib::client > Ratelimit reset: 208
Jun 27 03:36:34 base redlib[4054219]: WARN redlib::client > Rate limit 7 is low. Spawning force_refresh_token()
Jun 27 03:36:34 base redlib[4054219]: TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
Jun 27 03:36:34 base redlib[4054219]: INFO redlib::oauth > [โ
] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399
Jun 27 03:36:34 base redlib[4054219]: INFO redlib::oauth > Refreshing OAuth token... success
Jun 27 03:36:34 base redlib[4054219]: TRACE redlib::client > Ratelimit remaining: 99.0
Jun 27 03:36:34 base redlib[4054219]: TRACE redlib::client > Ratelimit used: 1
Jun 27 03:36:34 base redlib[4054219]: TRACE redlib::client > Ratelimit reset: 205
Jun 27 03:36:39 base redlib[4054219]: TRACE redlib::client > Ratelimit remaining: 99.0
I just wanted to say that maybe we should have better logs to diagnose whether it is caused by "async rwlock" or "token retriving timeout".
I'll try with a timeout.
trace!("Sending OAuth token request...");
let work = client.request(request);
let resp = match tokio::time::timeout(Duration::from_secs(10), work).await {
Ok(result) => result.ok()?,
Err(_) => {
error!("Timeout: no response in 10 seconds.");
return None;
}
};
trace!("Received response from OAuth token request...");
It's like some dead lock
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit remaining: 10.0
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit used: 90
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit reset: 77
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit remaining: 7.0
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit used: 93
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit reset: 76
Jun 27 15:58:43 base redlib[4141438]: WARN redlib::client > Rate limit 7 is low. Spawning force_refresh_token()
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::oauth > Trying to refresh OAuth token...
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::oauth > Sending OAuth token request...
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit remaining: 9.0
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit used: 91
Jun 27 15:58:43 base redlib[4141438]: TRACE redlib::client > Ratelimit reset: 77
Jun 27 15:58:44 base redlib[4141438]: WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
two force_refresh_token()
happened.
another caught
Jun 27 16:08:25 base redlib[4142262]: TRACE redlib::client > Ratelimit used: 92
Jun 27 16:08:25 base redlib[4142262]: TRACE redlib::client > Ratelimit reset: 95
Jun 27 16:08:25 base redlib[4142262]: TRACE redlib::client > Ratelimit remaining: 9.0
Jun 27 16:08:25 base redlib[4142262]: TRACE redlib::client > Ratelimit used: 91
Jun 27 16:08:25 base redlib[4142262]: TRACE redlib::client > Ratelimit reset: 95
Jun 27 16:08:27 base redlib[4142262]: WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
Jun 27 16:08:27 base redlib[4142262]: TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99
Jun 27 16:08:27 base redlib[4142262]: TRACE redlib::oauth > Trying to refresh OAuth token...
Jun 27 16:08:27 base redlib[4142262]: TRACE redlib::oauth > Sending OAuth token request...
Jun 27 16:08:27 base redlib[4142262]: TRACE redlib::client > Ratelimit remaining: 7.0
Jun 27 16:08:27 base redlib[4142262]: TRACE redlib::client > Ratelimit used: 93
Jun 27 16:08:27 base redlib[4142262]: TRACE redlib::client > Ratelimit reset: 92
Jun 27 16:08:27 base redlib[4142262]: WARN redlib::client > Rate limit 7 is low. Spawning force_refresh_token()
here's an example of my log output:
TRACE redlib::client > Ratelimit remaining: 9.0 TRACE redlib::client > Ratelimit used: 91 TRACE redlib::client > Ratelimit reset: 103 WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token() TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99 INFO redlib::oauth > [โ ] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399 INFO redlib::oauth > Refreshing OAuth token... success TRACE redlib::client > Ratelimit remaining: 8.0 TRACE redlib::client > Ratelimit used: 92 TRACE redlib::client > Ratelimit reset: 102 TRACE redlib::client > Ratelimit remaining: 5.0 TRACE redlib::client > Ratelimit used: 95 TRACE redlib::client > Ratelimit reset: 102 TRACE redlib::client > Ratelimit remaining: 7.0 TRACE redlib::client > Ratelimit used: 93 TRACE redlib::client > Ratelimit reset: 102 TRACE redlib::client > Ratelimit remaining: 6.0 TRACE redlib::client > Ratelimit used: 94 TRACE redlib::client > Ratelimit reset: 102 WARN redlib::client > Rate limit 6 is low. Spawning force_refresh_token() TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99 INFO redlib::oauth > [โ ] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399 INFO redlib::oauth > Refreshing OAuth token... success TRACE redlib::client > Ratelimit remaining: 3.0
Besides the occasional fact that a response may take longer to retrieve and the really small rate limits right before a refresh might accidentally overwrite the rate limit, my refreshing works just fine.
Even success, there are two refresh at the same time.
That shouldn't be a problem, though. Independent threads can of course reach reddit simultaneously, I don't know why it would block in this case. There's nothing wrong with overwriting the token with a new one twice in succession, and the RwLock
will stop any simultaneous modification.
Pushing a PR that should avoid the simultaneous token case, but I'm unsure if this will fix the problem. It rolls over on my machine under heavy load, but then again, so does main
right now...
@freedit-dev @ButteredCats I know you both experienced this issue, please let me know if #160 helps. latest
at quay.io has the newest version
Jun 28 03:49:51 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 62.0, we have 62.
Jun 28 03:49:51 base redlib[25514]: TRACE redlib::client > Ratelimit used: 38
Jun 28 03:49:51 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 8
Jun 28 03:49:53 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 61.0, we have 61.
Jun 28 03:49:53 base redlib[25514]: TRACE redlib::client > Ratelimit used: 39
Jun 28 03:49:53 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 6
Jun 28 03:49:53 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 60.0, we have 60.
Jun 28 03:49:53 base redlib[25514]: TRACE redlib::client > Ratelimit used: 40
Jun 28 03:49:53 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 6
Jun 28 03:49:55 base redlib[25514]: ERROR redlib::utils > Error page rendered: Nothing here
Jun 28 03:50:03 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 99.0, we have 59.
Jun 28 03:50:03 base redlib[25514]: TRACE redlib::client > Ratelimit used: 1
Jun 28 03:50:03 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 597
Jun 28 03:50:05 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 98.0, we have 58.
Jun 28 03:50:05 base redlib[25514]: TRACE redlib::client > Ratelimit used: 2
Jun 28 03:50:05 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 595
Jun 28 03:50:06 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 97.0, we have 57.
Jun 28 03:50:06 base redlib[25514]: TRACE redlib::client > Ratelimit used: 3
Jun 28 03:50:06 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 594
Jun 28 03:50:07 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 96.0, we have 56.
Jun 28 03:50:07 base redlib[25514]: TRACE redlib::client > Ratelimit used: 4
Jun 28 03:50:07 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 593
Jun 28 03:50:08 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 95.0, we have 55.
Jun 28 03:50:08 base redlib[25514]: TRACE redlib::client > Ratelimit used: 5
Jun 28 03:50:08 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 591
Jun 28 03:50:08 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 94.0, we have 54.
Jun 28 03:50:08 base redlib[25514]: TRACE redlib::client > Ratelimit used: 6
Jun 28 03:50:08 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 591
Jun 28 03:50:09 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 93.0, we have 53.
Jun 28 03:50:09 base redlib[25514]: TRACE redlib::client > Ratelimit used: 7
Jun 28 03:50:09 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 590
Jun 28 03:50:10 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 92.0, we have 52.
Jun 28 03:50:10 base redlib[25514]: TRACE redlib::client > Ratelimit used: 8
Jun 28 03:50:10 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 589
Then it backs to normal
Jun 28 03:51:57 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 50.0, we have 10.
Jun 28 03:51:57 base redlib[25514]: TRACE redlib::client > Ratelimit used: 50
Jun 28 03:51:57 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 482
Jun 28 03:51:59 base redlib[25514]: WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
Jun 28 03:51:59 base redlib[25514]: TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 8
Jun 28 03:51:59 base redlib[25514]: INFO redlib::oauth > [โ
] Success - Retrieved token "eyJhbGciOiJSUzI1NiIsImtpZCI6IlNI...", expires in 86399
Jun 28 03:51:59 base redlib[25514]: INFO redlib::oauth > Refreshing OAuth token... success
Jun 28 03:51:59 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 49.0, we have 9.
Jun 28 03:51:59 base redlib[25514]: TRACE redlib::client > Ratelimit used: 51
Jun 28 03:51:59 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 480
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 99.0, we have 99.
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit used: 1
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 480
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 98.0, we have 98.
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit used: 2
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 479
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 96.0, we have 97.
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit used: 4
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 479
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 97.0, we have 96.
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit used: 3
Jun 28 03:52:00 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 479
Jun 28 03:52:02 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 95.0, we have 95.
@sigaloid So far so good. But I want to say async rwlock
is not a good idea. It is not a frequent request and the current solution(async rwlock) is too complex.
From tokio doc https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html#method.read
Note that under the priority policy of RwLock, read locks are not granted until prior write locks, to prevent starvation. Therefore deadlock may occur if a read lock is held by the current task, a write lock attempt is made, and then a subsequent read lock attempt is made by the current task.
Hm. There's something I could possibly do to fix that - create a new Oauth struct and only call write to do an atomic swap, so the write call is held for far less time (just the time to do the swap, not the entire refresh call). Is that the kind of thing you mean?
Otherwise I'm not sure of a better solution besides a resource pool and that's a lot of overhead IMO vs just fixing the in place solution until it's performant
Async lock
is not an asynchronous version of std rwlock and is much more complex to deal with. Maybe you could try Arc_swap instead.
#160 not work
Jun 28 05:03:44 base redlib[25514]: TRACE redlib::client > Ratelimit used: 89
Jun 28 05:03:44 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 375
Jun 28 05:03:45 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 10.0, we have 10.
Jun 28 05:03:45 base redlib[25514]: TRACE redlib::client > Ratelimit used: 90
Jun 28 05:03:45 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 374
Jun 28 05:03:47 base redlib[25514]: WARN redlib::client > Rate limit 9 is low. Spawning force_refresh_token()
Jun 28 05:03:47 base redlib[25514]: TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 8
Jun 28 05:03:47 base redlib[25514]: TRACE redlib::client > Ratelimit remaining: Header says 9.0, we have 9.
Jun 28 05:03:47 base redlib[25514]: TRACE redlib::client > Ratelimit used: 91
Jun 28 05:03:47 base redlib[25514]: TRACE redlib::client > Ratelimit reset: 372
It doesn't roll over the tokens?
The logging order doesn't really matter, that's not indicative of an issue necessarily. but you're saying it hangs again?
@sigaloid right, it hang again even for the latest commint.
Maybe we should see these again
I just add more log and it stuck here
Line 74 in 3bd8b51
And restarting fixed it.
I'll try with a timeout.
trace!("Sending OAuth token request..."); let work = client.request(request); let resp = match tokio::time::timeout(Duration::from_secs(10), work).await { Ok(result) => result.ok()?, Err(_) => { error!("Timeout: no response in 10 seconds."); return None; } }; trace!("Received response from OAuth token request...");
Even timeout doesn't work.
I updated and mine hung again within a minute, didn't seem to last any longer like freedit's.
When I can I'll enable the extra debugging and see if I have any logs to add, but that won't be until tonight and I doubt there's anything different from what freedit has sent.
Restarting always fixed. So my instance auto restarts every 20mins.
Yeah, the issue is that the rollover doesn't work once the rate limit is hit. But the new token generation works at startup.
@freedit-dev, you mentioned arc-swap. I should have time tonight to push out a commit that switches to that. I assumed RwLock for the oauth client would be the best solution but honestly I'm not sure it is anymore. I don't know why it's not working, but I seriously doubt it's because of performance. CPU-based atomic are plenty to handle this workload of a few dozen requests a minute. On that note, what is your CPU architecture?
Seems to have fixed it for me!
Splendid to hear!!
So far so good. Thanks!
Amazing. Glad that I finally was able to fix it on the third go-around, huge thanks to @freedit-dev for recommending the ArcSwap. I bet it was exactly the deadlock you mentioned. ๐ค I did limit scope to as small as possible but ArcSwap is the better long-term solution. Thanks @ButteredCats and @freedit-dev again for helping me debug this, as it really only showed up under heavy load! Closing :)