dolfinus/AutoSitemap

Taking so much processing time

0x416c69 opened this issue · 5 comments

When I want to edit or add a page, AutoSitemap takes 70seconds to process.

These are my settings:

$wgAutoSitemap["notify"] = [
    'https://www.google.com/webmasters/sitemaps/ping?sitemap=https://wiki.arsacia.ir/sitemap.xml',
    'https://www.bing.com/webmaster/ping.aspx?sitemap=https://wiki.arsacia.ir/sitemap.xml'
];

$wgAutoSitemap["filename"] = "sitemap.xml";

MediaWiki Version: 1.34
PHP Version: 7.2.24 (litespeed)
MySQL Version: 5.5.62-0ubuntu0.14.04.1

Latest AutoSitemap files on master branch.

Sitemap is being generated on any page create/update/rename/delete. After any of these events is raised, it executes SQL query to get all pages should be added to sitemap, and time of such query execution increases with increasing total page count.

So how many pages in your wiki do you have?

There are few ways to decrease sitemap generation time:

  • Add namespace with non-important pages to exclusion list using option $wgAutoSitemap["exclude_namespaces"]
  • Add non-important page names to exclusion list using option $wgAutoSitemap["exclude_pages"]
  • Disable priority calculation based on revisions count with setting option $wgAutoSitemap["freq"] to any supported value excluding adjust (default)

I have 131 pages and I believe that it's not even close to being considered a lot. I also have powerful hardware.

Isn't there some sort of caching? Does AutoSitemap compute and recalculate all pages on a single page edit/add/delete/rename? Well if that's the case, there goes this extension's problem.

"Does AutoSitemap compute and recalculate all pages on a single page edit/add/delete/rename?"
Yes, of course.

There is no explicit caching of pages because the result can be wrong. Cache should be used to only if you have rarely changing data like MW settings or if data is being stored in cache only for seconds. Otherwise you can get errors because some data is not exist anymore or was updated in master source but not in the cache.
Sitemap is not like any of these options so some kind of static class variable or memcached as cache cannot be used here.

But you can improve this behavior using OPCache. If this extension is enabled in PHP, it takes less time to parse and execute code, to sitemap will be generated faster without any issues caused by frequently updated data.

I use PHP 7.3.12, MySQL 5.7.21 and MW 1.30.0 on shared hosting, and it takes about 3 seconds to generate sitemap for more than 2000 pages.
The real execution time is even less than that because I also use PhpTags extension which parses and executes php tags in pages content in a sandbox, and it executes on every page change too.

Actually, I quite surprised that generating sitemap for such a small amount of pages takes so long time.

I'm not much of an MW expert, but I don't think we're talking about the same kind of cache.

Obviously, when a page content updates in the database, it remains the way it is until it gets updated again.

And sitemap is just a file and writing to a file is FAST, it could be the MySQL server responding to the query too late or there's much of computation in PHP process. You should build a file for every page for the sitemap and regenerate that file only when it's updated. (Or create/delete when it's created/deleted) (Or just use a new table instead of file storage)

And then combine and compile the sitemap from those existing data instead.

Looking at your code there's no transaction nor caching.

You're trying to assume what step of sitemap generation is slow here but without any evidence of that like logs, SQL query traces and so on. Discussing some optimizations without such information looks like a premature optimization which is the root of all evil.

It's important to provide such information to get positive results in process of speeding this extension up.