/cve-archive

Tackle the issue of CVE reference linkrot

MIT LicenseMIT

cve-archive

It started with this: https://github.com/todb/junkdrawer/tree/master/cve-twitter-refs

I've been talking with @inokii about how to solve this, generally, for CVE references.

The Plan So Far

@inokii suggested looking at using ArchiveBox to process references, pulled from a cron job that scrapes out references from the published CVE lists (GitHub or MITRE's or whatever).

I'll give that a go. If that works, I'm now thinking we could seed some torrents of CVE references saved as WARC files or something. That way we can replicate at least the references in a disk-searchable way.

This will have much better fidelity going forward in time, as presumably, references linked today are much more likely to still be alive than a reference linked 10 years ago (we've already lost many races against linkrot).

This repo will store configurations and what all so someone other than us can replicate this work if they cared to.

Archiving is fun.

Archives -> Torrents

You can't update a torrent and expect to keep the hash over time. But, what you can do, is, say:

  • Get all references for all CVEs archived, as they stand right now, in middle 2023.
  • Get all diffs from that moment on, and create torrents regularly, datestamped by the current date.
    • Remember, CVE year parts might not be the actual year they were produced! This is a matter of some debate, but language has been proposed to at least scope the problem
  • Regularly might be daily/weekly/monthly/annually. Not sure yet.
  • The torrent idea came from this recent 99pi episode about archiving Geocities.

Costs

I'm pretty sure this can all be done on a cheap VPS from Linode or something. It depends on how much real on-disk space is required for all the WARCs and screenshots.

Stay Tuned!

This is more of a scrach space than anything. When the project is real, I'll update it.

If something here is broken, dumb, or otherwise not useful for you, Patches Accepted!