Broken page cache expiration logic
sattlerc opened this issue · 9 comments
This is currently the logic for expiring a page:
def expire_caches(page)
expire_cached_summary_pages(page.web)
pages_to_expire = [page.name] +
WikiReference.pages_redirected_to(page.web, @will_expire) +
WikiReference.pages_that_include(page.web, @will_expire)
unless (page.name == @will_expire)
pages_to_expire.concat ([@will_expire] +
WikiReference.pages_that_link_to(page.web, @will_expire) +
WikiReference.pages_that_reference(page.web, page.name))
end
pages_to_expire.uniq.each { |page_name| expire_cached_page(page.web, page_name) }
end
But shouldn't it be as follows?
Notation for a set of pages X:
- Write WITH_REDIRECTS(X) for the set of page names in X and redirects to some page in X.
- Call X closed under inclusion if whenever p includes a page in
WITH_REDIRECTS(X)X, then p is in X. Write INCLUSION_CLOSURE(X) for the smallest extension of X closed under inclusion. This can be computed recursively. Write INCLUSION_REDIRECTS(X) for WITH_REDIRECTS(INCLUSION_CLOSURE(X)).- Write LINK_TO(X) for the set of pages linking to some page in X.
Then when a page p is edited:
- For content changes: expire
INCLUSION_REDIRECTS(p)INCLUSION_CLOSURE(p). - For wanted link status changes: expire
INCLUSION_REDIRECTS(LINK_TO(WITH_REDIRECTS(p)))INCLUSION_CLOSURE(LINK_TO(WITH_REDIRECTS(p))).
There some further problems, which seem however by design and not so easily fixed:
- None of the page/revision edit and wiki_reference query database transactions are atomic, so presumably there are race conditions with simultaneous edits.
- Links to deleted redirects don't get expired, etc.
Have you looked at WikiReferences::pages_redirected_to
? Does it not do what you want?
I think your complaint is that WikiReferences::pages_that_include
is not computed recursively (or, possibly, that it does not handle inclusions of redirected pages). But that's not quite right.
- Let page A include page B and let page B include page C.
- Edit page C.
- When you save C, page A is expired.
I'm not sure what you mean by "links to deleted redirects don't get expired".
- Say I have a page A that redirects for (the nonexistent) page B and pages C and D that link to B. Clicking on one of those links will land me on page A.
- Now say I edit the page A and remove the redirect. Pages C and D gets expired, and the links on the are turned into "wanted page" links.
- Now say I click on the link on page C. This creates the new page B. When I save B, page D is expired.
Is this not the behaviour you expect?
Just to be clear, inclusion of redirected pages is not allowed.
- Page A redirects for B.
- On page C,
[[!include A]]
is allowed, but[[!include B]]
is not.
I wanted to try, but I couldn't figure out how to install Instiki again. I tried on two systems with ruby-3.0 (Arch Linux and Debian) and also the Dockerfile. Following the README, the command
ruby bundle install --path vendor/bundle
finishes successfully. But when running instiki
I get the same error in every configuration:
~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require': cannot load such file -- rack/handler (LoadError)
from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `block in require'
from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:547:in `new_constants_in'
from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require'
from ~/instiki/script/server:58:in `rescue in <top (required)>'
from ~/instiki/script/server:55:in `<top (required)>'
from ./instiki:6:in `load'
from ./instiki:6:in `<main>'
~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require': cannot load such file -- rack/handler (LoadError)
from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `block in require'
from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:547:in `new_constants_in'
from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require'
from ~/instiki/script/server:56:in `<top (required)>'
from ./instiki:6:in `load'
from ./instiki:6:in `<main>'
What am I doing wrong?
EDIT: When I tried the Dockerfile, I had to lift the Ruby verison to 2.6. Otherwise, the container build failed.
A separate cache expiration problem I just noticed when testing:
- Every page says which pages it is included in. This is not expired when that information changes.
Just to be clear, inclusion of redirected pages is not allowed.
* Page A redirects for B. * On page C, `[[!include A]]` is allowed, but `[[!include B]]` is not.
I see. I didn't know that.
I'm not sure what you mean by "links to deleted redirects don't get expired".
Example:
- Create X with link to Y. Link is wanted.
- Create Y1 with redirect for Y. Link on X is not wanted.
- Delete the redirect on Y1. Link on X is not wanted (but should be wanted).
It seems even links to renamed pages don't get expired. Example:
- Create X with link to Y. Link is wanted.
- Create Y. Link on X is not wanted.
- Rename X to X1 without setting a redirect. Link on X is not wanted (but should be wanted).
- Say I have a page A that redirects for (the nonexistent) page B and pages C and D that link to B. Clicking on one of those links will land me on page A.
- Now say I edit the page A and remove the redirect. Pages C and D gets expired, and the links on the are turned into "wanted page" links.
I can't reproduce this.
Here is another bug:
- Create A with inclusion of B. Shows "could not include B".
- Create C.
- Rename C to B. A still shows "could not include B".
I think your complaint is that
WikiReferences::pages_that_include
is not computed recursively (or, possibly, that it does not handle inclusions of redirected pages). But that's not quite right.* Let page A include page B and let page B include page C. * Edit page C. * When you save C, page A is expired.
Counterexample (not involving redirects):
- Create A with inclusion of B. Shows "could not include B".
- Create B with inclusion of C. Shows "could not include C". Footer says "Linked from: A" (should say: included from A).
- Refresh A. Shows "could not include C".
- Create C. Refresh A. Still says "could not include C" (should include C).
EDIT: Page C is not necessary in this example. So just:
- Create A with inclusion of B. Shows "could not include B".
- Create B. Footer says "Linked from: A" (should say: included from A).
- Refresh A. Shows content of B.
- Edit B. Refresh A. Still shows old content of B.