ipfs/distributed-wikipedia-mirror

Make `execute-changes` script more generally usable

flyingzumwalt opened this issue · 4 comments

Make the script from #18, which is in https://github.com/ipfs/distributed-wikipedia-mirror/blob/master/execute-changes.sh more generally usable. This is needed in order to fulfill #14 and in order to make it easier for other people to generate their own snapshots.

As a hacker who wants to add new wikipedia snapshots to IPFS in the language of my choice, I should be able to follow a clear set of instructions that allow me to download a zim dump, add it to IPFS, modify it with this script and then publish the resulting hash. The instructions should be clear, the configuration should be simple and it should be easy for me to set the correct IPNS hash and snapshot date based on the language version I'm adding.

Completion Requirements

  • the shell script works with minimal pre-configuration of your system
  • the shell script's documentation clearly declares what you need to do in order to run it
  • the shell script makes it easy or completely transparent to set the correct IPNS hash and snapshot date based on the current date and the language of the current snapshot
  • the readme at the root of this repo, or a page it links to, contains complete and accurate instructions for using this script when you're creating a snapshot

@Kubuxu cleaned up the script a bunch, making it much more general and adding command line args with help text. Still needs documentation on how to use it.

How to use the execute-changes.sh script:

  1. extract your dump as in instruction
  2. add it and note the hash as in instruction
  3. link the dump in Virtual IPFS Files API ipfs files cp /ipfs/$DUMP_HASH /wiki-zn
  4. Execute the execute-changes.sh:
bash execute-changes.sh /wiki-zn

The execute-changes has some options like:

  • --ipns $IPNS to add the ipns link of the dump
  • --search $SEARCH_CID to add search function, it needs generated search structure CID
  • --date $SOME_DATE to change the date of the snapshot creation
  • --main $MAIN_PAGE full name of article containing intro page (e.g. Main_Page.html as in english wiki) - defaults to index.htm as in many Kiwix dumps.
lidel commented

Update from #60: tried to use execute-changes against unpacked wikipedia_tr_all_maxi_2019-12.zim (~4GB) but without success. JS and the directory structure changed so much, that entire execute-changes.sh needs to be redone.

lidel commented

Continued in #64