lahwaacz/wiki-scripts

Mirroring a wiki

Closed this issue · 2 comments

References:

To do list:

  • abstraction for working with an SQL database
  • grabbers for fetching important tables from API
    • namespace, namespace_name (custom tables)
    • recentchanges, logging
    • user, user_groups, ipblocks
    • page, page_props, page_restrictions, protected_titles
    • archive, revision, text: wait for list=allrevisions module in MediaWiki 1.27 https://phabricator.wikimedia.org/T113885
    • tags
    • interwiki
  • handle difficult actions involving DELETE or UPDATE queries as part of the syncing process:
    • removing from user groups
    • unblock
    • unprotect
    • delete (move from revision to archive)
    • undelete (move from archive to revision, also check page_id)
    • selective undelete
    • merge (works assuming that both source and target page were not deleted before the sync)
    • import (works assuming that the imported pages were not deleted/merged/whatever before the sync)
    • delete/revision, delete/event
    • tag/update (separately for recentchanges, logging, revision, archive)
    • other log events: https://wiki.archlinux.org/api.php?action=help&modules=query%2Blogevents
  • let the SQL database serve as a source of data instead of the API
    • list=recentchanges
    • list=logevents
    • list=allpages
    • list=protectedtitles
    • list=allrevisions
    • list=alldeletedrevisions
    • titles=, pageids= for use with prop=
    • common executor for the DB select queries (for easy profiling)
  • framework for tests
    • pytest fixture for web server (nginx)
    • pytest fixture for php-fpm
    • pytest fixture for MediaWiki installation (depends on nginx, php-fpm, postgresql + MW sources, config, initial SQL)
    • write the tests...
    • implement a double-source wrapper, which yields from the API and checks the DB selects, ignoring NotImplementedErrors etc. (usable for unit tests as well as real-world testing) split into #50

This is somewhat finished and working nicely, so it's time to close this.