internet-archiving
There are 25 repositories under internet-archiving topic.
ArchiveBox/ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
akamhy/waybackpy
Wayback Machine API interface & a command-line tool
pirate/wikipedia-mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
ArchiveBox/good-karma-kit
😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...
ArchiveBox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
ArchiveBox/electron-archivebox
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
vegetableman/vandal
Navigator for Web Archive
mikwielgus/forum-dl
Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC
pirate/internet-archiving-talk
🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.
ArchiveBox/docker-archivebox
Home of the official docker image for ArchiveBox
Own-Data-Privateer/hoardy-web
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
ArchiveBox/readability-extractor
Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.
ArchiveBox/homebrew-archivebox
Homebrew formula for the ArchiveBox self-hosted internet archiving solution.
ArchiveBox/debian-archivebox
Home of the official apt/deb package for Ubuntu/Debian-based systems.
ArchiveBox/DigestBox
DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.
ArchiveBox/archivebox-proxy
Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.
ArchiveBox/docs
Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.
ArchiveBox/pip-archivebox
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
itsliamdowd/WaybackBrowserMacOS
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
itsliamdowd/WaybackBrowserWindows
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
Quoorex/archive-file-urls
Submit URLs listed inside a file to website archival services
gabldotink/sharkive.old
upload stuff to the Internet Archive using a shell script
httpreserve/conventoarchiver
Repository for collecting scripts to help capture MyConvento newsroom press-releases from the MyConvento PR management suite. The README provides an analysis of the MyConvento URL architecture for users hoping to develop a solution for themselves.
TheLovinator1/FeedVault.se
FeedVault is an open-source web application that allows users to archive and search their favorite web feeds.
Fooftilly/RSS_archiver
Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.