internet-archiving

There are 25 repositories under internet-archiving topic.

  • ArchiveBox

    ArchiveBox/ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

    Language:Python21k1759031.1k
  • waybackpy

    akamhy/waybackpy

    Wayback Machine API interface & a command-line tool

    Language:Python463108334
  • pirate/wikipedia-mirror

    🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

    Language:Shell3518430
  • good-karma-kit

    ArchiveBox/good-karma-kit

    😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...

  • ArchiveBox/archivebox-browser-extension

    Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

    Language:TypeScript21092518
  • ArchiveBox/electron-archivebox

    Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

    Language:JavaScript1778615
  • vegetableman/vandal

    Navigator for Web Archive

    Language:JavaScript156686
  • mikwielgus/forum-dl

    Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC

    Language:Python694182
  • pirate/internet-archiving-talk

    🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

    Language:JavaScript48505
  • ArchiveBox/docker-archivebox

    Home of the official docker image for ArchiveBox

    Language:Dockerfile463112
  • Own-Data-Privateer/hoardy-web

    Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

    Language:JavaScript36240
  • ArchiveBox/readability-extractor

    Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.

    Language:JavaScript354213
  • ArchiveBox/homebrew-archivebox

    Homebrew formula for the ArchiveBox self-hosted internet archiving solution.

    Language:Ruby26303
  • ArchiveBox/debian-archivebox

    Home of the official apt/deb package for Ubuntu/Debian-based systems.

    Language:Python17325
  • DigestBox

    ArchiveBox/DigestBox

    DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.

    Language:HTML15210
  • ArchiveBox/archivebox-proxy

    Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.

    Language:Python14200
  • ArchiveBox/docs

    Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.

    Language:CSS14313
  • ArchiveBox/pip-archivebox

    Official Python package for ArchiveBox, the self-hosted internet archiving solution.

  • itsliamdowd/WaybackBrowserMacOS

    Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻

    Language:Swift8341
  • itsliamdowd/WaybackBrowserWindows

    Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻

    Language:Python5100
  • Quoorex/archive-file-urls

    Submit URLs listed inside a file to website archival services

    Language:Python3200
  • gabldotink/sharkive.old

    upload stuff to the Internet Archive using a shell script

    Language:Shell1110
  • httpreserve/conventoarchiver

    Repository for collecting scripts to help capture MyConvento newsroom press-releases from the MyConvento PR management suite. The README provides an analysis of the MyConvento URL architecture for users hoping to develop a solution for themselves.

    Language:Python1300
  • TheLovinator1/FeedVault.se

    FeedVault is an open-source web application that allows users to archive and search their favorite web feeds.

    Language:Python1050
  • Fooftilly/RSS_archiver

    Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.

    Language:Python0200