Cache expiration race conditions
Closed this issue · 15 comments
The web and revision cache sweepers seem vulnerability to the race condition described here: https://gist.github.com/bricker/6255064
Basically, if user A edits a page and user B requests the page after the cache has been expired in after_safe
but before the transaction has been committed, then the old page could be rerendered and stay cached even after user A's save request is finished.
There are lots of other issues in the cache sweepers that I don't understand. As far as I can tell, wiki references store their source page as an id and their target page by name. Doesn't that mean that, when a page gets renamed, pages with references targetting the old or new name need to be expired?
The web and revision cache sweepers seem vulnerability to the race condition described here: https://gist.github.com/bricker/6255064
That looks better than what the cache sweeper currently do in an attempt to avoid such race conditions.
Doesn't that mean that, when a page gets renamed, pages with references targetting the old or new name need to be expired?
Hmmm.
- Pages referring to the old name are, of course, expired.
- But pages referring to the new name. which have a 'create page' link (since a page by that name did not previously exist) are not expired.
That's clearly a bug ...
Given that Rails 2 doesn't have on_commit
, perhaps it is easier (in the case of the revision sweeper) to directly expire the cache in the revise
method after both the page and the revision have been saved and other places. That makes it easier to:
- do the expiration at the correct time,
- get the old and new name of a page,
- distinguish between real page edits and page locking updates (which currently needlessly expire lots of things; in the case of the nlab, this led to an extra of ~1s when clicking on an edit button),
- distinguish between page creation, deletion, or edits.
Another minor problem (not a race condition):
The URL for a revision is valid even if the revision number exceeds the number of revisions. In that case, the served page is that of the current revision, which gets cached under the URL for the non-existing revision. When a page is deleted, its revisions are expired, but only numbering up to the current revision. So the "redirected" revision pages stay in the cache indefinitely.
I guess a proper redirect (or an error message) is a better way of handling invalid revision numbers.
Another minor problem (not a race condition):
The URL for a revision is valid even if the revision number exceeds the number of revisions. In that case, the served page is that of the current revision, which gets cached under the URL for the non-existing revision. When a page is deleted, its revisions are expired, but only numbering up to the current revision. So the "redirected" revision pages stay in the cache indefinitely.
I guess a proper redirect (or an error message) is a better way of handling invalid revision numbers.
This is fixed in Revision 90efa50.
Thanks.
Given that Rails 2 doesn't have
on_commit
, perhaps it is easier (in the case of the revision sweeper) to directly expire the cache in therevise
method after both the page and the revision have been saved and other places. That makes it easier to:
do the expiration at the correct time,
get the old and new name of a page,
distinguish between real page edits and page locking updates (which currently needlessly expire lots of things; in the case of the nlab, this led to an extra of ~1s when clicking on an edit button),
distinguish between page creation, deletion, or edits.
I'm not sure I understand the "page locking updates" comment. The cache sweeper has a before_save
hook and and an after_save
hook. Clicking on "edit" should have absolutely no effect whatsoever, vis-a-vis the cache. Only invoking the save
action should.
You're saying that locking the page causes the cache-sweeper to be run ?
I'll investigate ....
To be more precise, the revision_sweeper
does indeed observe
both the Page
and the Revision
models.
But the only changes on the Page
model that are watched for are create
and delete
. Updating the locked
attribute of a Page
has no effect. So you have me confused ...
You're saying that locking the page causes the cache-sweeper to be run ?
I'll investigate ....
I apologize, I was looking at the version of the cache sweeper in the nlab codebase when I the third point. There, the page/revision sweeper is very messed up and has this problem. The official Instiki version doesn't have it.
OK. So let me see if I understand
- All actual cache-expiration should be done in the
after_save
hook. In thebefore_save
hook, only promises should be made. - When a page's name is changed, pages that refer to the new name should also be expired (currently, they are not).
Have I understood correctly?
Regarding 1, despite the name the after_save
hook is run before the commit to the database is made (see point 1 in https://gist.github.com/bricker/6255064). One would need an after_commit
hook, which was added in Rails 3. But it seems there is a gem that backports it to Rails2: https://rubygems.org/gems/after_commit
Regarding 2, I think it's more than just name changes. What about redirect changes?
I was a little worried that the activerecord
and activesupport
gems installed by the after_commit
gem would conflict with the modified versions that Instiki bundles in vendor/rails
. But it seems to work (both under Ruby 2.7 and Ruby 3.0).
See 9fa625d.
P.S.: I have never run into this race condition in Instiki. Given the thoroughly-borked state of the nlab's cache-sweeper, it seems to me that this race condition was the least of your worries. But, as long as it doesn't cause problems, I am happy to use on_commit
.
I'm just stating theoretical problems for reference. It's up to you if it's worthwhile fixing. :)
But partly, I am using your edits to the cache sweeper as a guide to restoring proper functionality of the cache sweeper in the nlab. Unfortunately, it has the additional problem that it needs to deal with another level of caching introduced by the added Python part of the codebase.
I'm just stating theoretical problems for reference. It's up to you if it's worthwhile fixing. :)
Well, the on_commit
hook could simply be replaced by on_save
with (I expect) zero change in functionality. If there are any problems, I will just revert back to that.
It was, however, useful to think through the logic of the revision_sweeper
again, to clean up the other issues you mentioned.
Unfortunately, it has the additional problem that it needs to deal with another level of caching introduced by the added Python part of the codebase.
You're on your own there.
Let me know if there are any issues with the revised revision_sweeper.rb
Unfortunately, I don't have tests to cover that bit of functionality.
I think all of the issues raised in this thread have been addressed by recent commits to app/controller/cache_sweeping_helper.rb
and app/controllers/revision_sweeper.rb
.
If that's not the case, feel free to reopen.