web-archiving
There are 111 repositories under web-archiving topic.
ArchiveBox/ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Rhizome-Conifer/conifer
Collect and revisit web pages.
webrecorder/pywb
Core Python Web Archiving Toolkit for replay and recording of web archives
webrecorder/archiveweb.page
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
webrecorder/replayweb.page
Serverless replay of web archives directly in the browser
webrecorder/browsertrix-crawler
Run a high-fidelity browser-based web archiving crawler in a single Docker container
oduwsdl/ipwb
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
gildas-lormeau/single-file-cli
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
bellingcat/auto-archiver
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
akamhy/waybackpy
Wayback Machine API interface & a command-line tool
webrecorder/webrecorder-player
Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
rahiel/archiveror
Archiveror will help you preserve the webpages you love. 💾
harvard-lil/perma
Indelible links
oduwsdl/archivenow
A Tool To Push Web Resources Into Web Archives
Florents-Tselai/WarcDB
WarcDB: Web crawl data as SQLite databases.
webrecorder/warcio
Streaming WARC/ARC library for fast web archive IO
machawk1/wail
:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation
ArchiveBox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
machawk1/warcreate
Chrome extension to "Create WARC files from any webpage"
ArchiveBox/electron-archivebox
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
webrecorder/browsertrix
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
cocrawler/cdx_toolkit
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
gwu-libraries/sfm-ui
Social Feed Manager user interface application.
helgeho/ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
programminghistorian/ph-submissions
The repository and website hosting the peer review process for new Programming Historian lessons
N0taN3rd/wail
:whale2: One-Click User Instigated Preservation
internetarchive/fatcat
Perpetual Access To The Scholarly Record
maxcountryman/warc-parquet
🗄️ A simple CLI for converting WARC to Parquet.
N0taN3rd/node-warc
Parse And Create Web ARChive (WARC) files with node.js
oduwsdl/warrick
Recover lost websites from the Web Infrastructure
xarantolus/Collect
A server to collect & archive websites that also supports video downloads
oduwsdl/MemGator
A Memento Aggregator CLI and Server in Go
pirate/internet-archiving-talk
🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.
Own-Data-Privateer/hoardy-web
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
TarekJor/bookmark-archiver
🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...
nla/outbackcdx
Web archive index server based on RocksDB