Python tool for archiving web pages through Internet Archive Wayback Machine
It's recommended to use tools like pipx to install this command-line tool.
pipx install wayback-machine-saver
Save URLs from the input file to Internet Archive - Wayback Machine
wayback-machine-saver save-pages FILENAME
- FILENAME: filename to the file that consists of URLs to save
e.g.,
https://example.com
https://another-example.com
- --deliminator TEXT [default: "\n"]
- --error-log-filename TEXT [default: save-pages-error-log-"timestamp".csv]
After the URLs have been saved, Internet Archive - Wayback Machine will snap-shot the page to their database and create a timestamp. You can access the latest one through http://web.archive.org/web/[Your URL]
and it will be redirected to http://web.archive.org/web/[timestamp]/[Your URL]
. This command is used to get the redirected URLs.
wayback-machine-saver get-latest-archive-urls FILENAME
- FILENAME: filename to the file that consists of URLs to retrieved
e.g.,
https://example.com
https://another-example.com
- --deliminator TEXT [default: "\n"]
- --output-filename TEXT [default: retrieved-urls-"timestamp".csv]]
- --error-log-filename TEXT [default: get-url-error-log-"timestamp".csv]
Wayback Machine Saves supports configurating through environment variable. You can run export VARIABLE=VALUE
before running the script to change the behavior.
- WAYBACK_MACHINE_SAVER_RETRY_TIMES
- times to retry (default: 3)
- HTTPX_TIMEOUT
- timeout for all GET operations (default: 10)
See Contributing
Wei Lee weilee.rx@gmail.com
Created from Lee-W/cookiecutter-python-template version 0.9.0