internet-archiving

There are 25 repositories under internet-archiving topic.

ArchiveBox/ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Language:Python21k 175 9031.1k
akamhy/waybackpy
Wayback Machine API interface & a command-line tool
Language:Python463 10 8334
pirate/wikipedia-mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
Language:Shell351 8 430
ArchiveBox/good-karma-kit
😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...
301 7 28
ArchiveBox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
Language:TypeScript210 9 2518
ArchiveBox/electron-archivebox
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
Language:JavaScript177 8 615
vegetableman/vandal
Navigator for Web Archive
Language:JavaScript156 6 86
mikwielgus/forum-dl
Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC
Language:Python69 4 182
pirate/internet-archiving-talk
🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.
Language:JavaScript48 5 05
ArchiveBox/docker-archivebox
Home of the official docker image for ArchiveBox
Language:Dockerfile46 3 112
Own-Data-Privateer/hoardy-web
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
Language:JavaScript36 2 40
ArchiveBox/readability-extractor
Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.
Language:JavaScript35 4 213
ArchiveBox/homebrew-archivebox
Homebrew formula for the ArchiveBox self-hosted internet archiving solution.
Language:Ruby26 3 03
ArchiveBox/debian-archivebox
Home of the official apt/deb package for Ubuntu/Debian-based systems.
Language:Python17 3 25
ArchiveBox/DigestBox
DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.
Language:HTML15 2 10
ArchiveBox/archivebox-proxy
Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.
Language:Python14 2 00
ArchiveBox/docs
Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.
Language:CSS14 3 13
ArchiveBox/pip-archivebox
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
13 2 02
itsliamdowd/WaybackBrowserMacOS
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
Language:Swift8 3 41
itsliamdowd/WaybackBrowserWindows
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
Language:Python5 1 00
Quoorex/archive-file-urls
Submit URLs listed inside a file to website archival services
Language:Python3 2 00
gabldotink/sharkive.old
upload stuff to the Internet Archive using a shell script
Language:Shell1 1 10
httpreserve/conventoarchiver
Repository for collecting scripts to help capture MyConvento newsroom press-releases from the MyConvento PR management suite. The README provides an analysis of the MyConvento URL architecture for users hoping to develop a solution for themselves.
Language:Python1 3 00
TheLovinator1/FeedVault.se
FeedVault is an open-source web application that allows users to archive and search their favorite web feeds.
Language:Python1 0 50
Fooftilly/RSS_archiver
Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.
Language:Python0 2 00

internet-archiving

ArchiveBox/ArchiveBox

akamhy/waybackpy

pirate/wikipedia-mirror

ArchiveBox/good-karma-kit

ArchiveBox/archivebox-browser-extension

ArchiveBox/electron-archivebox

vegetableman/vandal

mikwielgus/forum-dl

pirate/internet-archiving-talk

ArchiveBox/docker-archivebox

Own-Data-Privateer/hoardy-web

ArchiveBox/readability-extractor

ArchiveBox/homebrew-archivebox

ArchiveBox/debian-archivebox

ArchiveBox/DigestBox

ArchiveBox/archivebox-proxy

ArchiveBox/docs

ArchiveBox/pip-archivebox

itsliamdowd/WaybackBrowserMacOS

itsliamdowd/WaybackBrowserWindows

Quoorex/archive-file-urls

gabldotink/sharkive.old

httpreserve/conventoarchiver

TheLovinator1/FeedVault.se

Fooftilly/RSS_archiver