ipfs-inactive/archives

Automatic Updates

davidar opened this issue · 4 comments

All of the archives are currently being imported to IPFS manually. This is fine as a starting point, but we need to write some scripts to keep them up to date with the origin, run them periodically and publish the changes over IPNS.

More specifically, what I'm looking for is:

  • a standard format for storing archive metadata: authoritative timestamps, raw file hashes (e.g. md5sums/sha1sums, not just unixfs hashes), and whatever other information is useful in determining if a file has been updated
  • protocal handlers (rsync, oai, http, ftp, resync, etc) that can use this metadata to determine exactly which files need to be updated
  • a tool that can splice the new files into the existing archive, without having to download the current contents of the archive (which could be quite large)
  • a script to glue all of this together nicely, that can automatically push updates to IPNS, and is cron-friendly

CC: @rht @jbenet #18

very much agreed. ideally for every thing we import we would create a program to do it, that updates with a new root when run again, (and the ipfs root of the archive would make sure to link to said program too)

i think we can write a standard format for commandline tools to do this:

> archive-arxiv.org <last-root>
> archive wikipedia.org <last-root>

etc.

@vinctux @CounterPillow @jbenet Migrated off-topic discussion to ipfs/notes#50

In which language the update tools should be written?

@vinctux my personal preference for this would be Python :)