/bzeem

Wikipedia on swarm

Primary LanguageShell

Overview

Wikipedia contains a lot of data. Swarm is a place to store data, so it's hard to lose it. Bzeem is a proof-of-concept script that puts snapshots of Wikipedia on Swarm.

Instructions

# Set up a swarm node on your local computer, and make sure it's running:
# https://swarm-guide.readthedocs.io/en/latest/gettingstarted.html

# Download ZIM file of choice - see https://wiki.kiwix.org/wiki/Main_Page
wget myfile.zim

# Make sure you have a local swarm node running before this:
./bzeem.sh myfile.zim

# Spread the link!

How it works

  • One of the ways to view Wikipedia content without wikipedia.org is Kiwix, which reads snapshots of Wikipedia stored in .zim files
  • .zim files are compressed archives of HTML files and images, similar to .zip, .tar or .epub
  • bzeem.sh expands .zim files to a local folder, makes some adjustments and uploads the result to swarm
  • The content can now be viewed using any swarm gateway

Where to get Wikipedia data

  • Kiwix hosts ZIM files for a wide range of open content
  • Wikimedia has more recent and complete download options, but does not offer ZIM files directly

Inspiration

Bzeem was inspired by:

  • distributed-wikipedia-mirror, a similar project that stores Wikipedia on IPFS. Bzeem right now is basically a poor man's port of that project, without the nice features (like search!)
  • XOWA has an alternate approach: store raw Wikipedia markup and render it on the fly
  • This article spells out the differences between Kiwix and XOWA

Potential ways forward

Short term & known issues

  • good enough for MVP :)

Longer term

  • Store raw data and render it on the fly
    • In browsers, it shouldn't be impossible to do this in JavaScript (scripts, anyone?)
    • The full history can be downloaded from the above site - each version could be stored
    • How to store versions efficiently? look to git for inspiration?
  • ENS
  • Swarm feeds?
  • Search - potentially, this can be done by building an index that JavaScript in the browser consumes. Cool WASM project.. anyone?

Troubleshooting

  • I upload a zim and the link doesn't work!
    • It might take a while to sync data between peers in swarm - try again in a bit
    • Check your connectivity to the swarm network in general
    • It might be that there is not enough capacity to host your material at the moment - help the network by deploying some (stable) nodes - it's very easy
    • Swarm itself is under heavy development - head over to their github and fix some bugs :)

Funding

Like this project? Send something over to:

ETH: 0x90DD70149566E76DAF9E43893f836343bbCB9232

Any kind of TX welcome :)