parasew/instiki

Broken page cache expiration logic

sattlerc opened this issue · 9 comments

This is currently the logic for expiring a page:

  def expire_caches(page)
    expire_cached_summary_pages(page.web)
    pages_to_expire = [page.name] +
       WikiReference.pages_redirected_to(page.web, @will_expire) +
       WikiReference.pages_that_include(page.web, @will_expire)
    unless (page.name == @will_expire)
      pages_to_expire.concat ([@will_expire] +
        WikiReference.pages_that_link_to(page.web, @will_expire) +
        WikiReference.pages_that_reference(page.web, page.name))
    end
    pages_to_expire.uniq.each { |page_name| expire_cached_page(page.web, page_name) }
  end

But shouldn't it be as follows?

Notation for a set of pages X:

  • Write WITH_REDIRECTS(X) for the set of page names in X and redirects to some page in X.
  • Call X closed under inclusion if whenever p includes a page in WITH_REDIRECTS(X) X, then p is in X. Write INCLUSION_CLOSURE(X) for the smallest extension of X closed under inclusion. This can be computed recursively.
  • Write INCLUSION_REDIRECTS(X) for WITH_REDIRECTS(INCLUSION_CLOSURE(X)).
  • Write LINK_TO(X) for the set of pages linking to some page in X.

Then when a page p is edited:

  • For content changes: expire INCLUSION_REDIRECTS(p) INCLUSION_CLOSURE(p).
  • For wanted link status changes: expire INCLUSION_REDIRECTS(LINK_TO(WITH_REDIRECTS(p))) INCLUSION_CLOSURE(LINK_TO(WITH_REDIRECTS(p))).

There some further problems, which seem however by design and not so easily fixed:

  • None of the page/revision edit and wiki_reference query database transactions are atomic, so presumably there are race conditions with simultaneous edits.
  • Links to deleted redirects don't get expired, etc.

Have you looked at WikiReferences::pages_redirected_to? Does it not do what you want?

I think your complaint is that WikiReferences::pages_that_include is not computed recursively (or, possibly, that it does not handle inclusions of redirected pages). But that's not quite right.

  • Let page A include page B and let page B include page C.
  • Edit page C.
  • When you save C, page A is expired.

I'm not sure what you mean by "links to deleted redirects don't get expired".

  • Say I have a page A that redirects for (the nonexistent) page B and pages C and D that link to B. Clicking on one of those links will land me on page A.
  • Now say I edit the page A and remove the redirect. Pages C and D gets expired, and the links on the are turned into "wanted page" links.
  • Now say I click on the link on page C. This creates the new page B. When I save B, page D is expired.

Is this not the behaviour you expect?

Just to be clear, inclusion of redirected pages is not allowed.

  • Page A redirects for B.
  • On page C, [[!include A]] is allowed, but [[!include B]] is not.

I wanted to try, but I couldn't figure out how to install Instiki again. I tried on two systems with ruby-3.0 (Arch Linux and Debian) and also the Dockerfile. Following the README, the command

ruby bundle install --path vendor/bundle

finishes successfully. But when running instiki I get the same error in every configuration:

~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require': cannot load such file -- rack/handler (LoadError)
        from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `block in require'
        from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:547:in `new_constants_in'
        from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require'
        from ~/instiki/script/server:58:in `rescue in <top (required)>'
        from ~/instiki/script/server:55:in `<top (required)>'
        from ./instiki:6:in `load'
        from ./instiki:6:in `<main>'
~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require': cannot load such file -- rack/handler (LoadError)
        from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `block in require'
        from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:547:in `new_constants_in'
        from ~/instiki/vendor/rails/activesupport/lib/active_support/dependencies.rb:182:in `require'
        from ~/instiki/script/server:56:in `<top (required)>'
        from ./instiki:6:in `load'
        from ./instiki:6:in `<main>'

What am I doing wrong?

EDIT: When I tried the Dockerfile, I had to lift the Ruby verison to 2.6. Otherwise, the container build failed.

A separate cache expiration problem I just noticed when testing:

  • Every page says which pages it is included in. This is not expired when that information changes.

Just to be clear, inclusion of redirected pages is not allowed.

* Page A redirects for B.

* On page C, `[[!include A]]` is allowed, but `[[!include B]]` is not.

I see. I didn't know that.

I'm not sure what you mean by "links to deleted redirects don't get expired".

Example:

  • Create X with link to Y. Link is wanted.
  • Create Y1 with redirect for Y. Link on X is not wanted.
  • Delete the redirect on Y1. Link on X is not wanted (but should be wanted).

It seems even links to renamed pages don't get expired. Example:

  • Create X with link to Y. Link is wanted.
  • Create Y. Link on X is not wanted.
  • Rename X to X1 without setting a redirect. Link on X is not wanted (but should be wanted).
  • Say I have a page A that redirects for (the nonexistent) page B and pages C and D that link to B. Clicking on one of those links will land me on page A.
  • Now say I edit the page A and remove the redirect. Pages C and D gets expired, and the links on the are turned into "wanted page" links.

I can't reproduce this.

Here is another bug:

  • Create A with inclusion of B. Shows "could not include B".
  • Create C.
  • Rename C to B. A still shows "could not include B".

I think your complaint is that WikiReferences::pages_that_include is not computed recursively (or, possibly, that it does not handle inclusions of redirected pages). But that's not quite right.

* Let page A include page B and let page B include page C.

* Edit page C.

* When you save C, page A is expired.

Counterexample (not involving redirects):

  • Create A with inclusion of B. Shows "could not include B".
  • Create B with inclusion of C. Shows "could not include C". Footer says "Linked from: A" (should say: included from A).
  • Refresh A. Shows "could not include C".
  • Create C. Refresh A. Still says "could not include C" (should include C).

EDIT: Page C is not necessary in this example. So just:

  • Create A with inclusion of B. Shows "could not include B".
  • Create B. Footer says "Linked from: A" (should say: included from A).
  • Refresh A. Shows content of B.
  • Edit B. Refresh A. Still shows old content of B.