/conservancy

Website conservation utility for UNNA.org

Primary LanguageShellMIT LicenseMIT

Conservancy

OVERVIEW

A utility for preserving websites on mirrors.UNNA.org. It is primarily a wrapper around wget, but performing additional verification tasks.

PREREQUISITES

USAGE

Running conserve <url> will slowly and recursively mirror the page, plus sibling & child pages (but not parent pages), using wget.

The site will be archived into a directory named for the URL's hostname. A wget log file will also be generated.

Upon completion, it'll output the following to STDOUT:

  • Any missing files (linking to Internet Archive's Wayback Machine if the files exist there, plus listing any similarly named files that were downloaded)
  • Any files still containing links to the URL

One can then manually try to find & replace missing files, clean up links, etc.

REFERENCE