web-archiving

There are 121 repositories under web-archiving topic.

ArchiveBox/ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Language:Python25.6k 175 1k1.4k
webrecorder/pywb
Core Python Web Archiving Toolkit for replay and recording of web archives
Language:JavaScript1.6k 57 503238
Rhizome-Conifer/conifer
Collect and revisit web pages.
Language:Python1.5k 50 398123
webrecorder/archiveweb.page
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
Language:TypeScript1.1k 19 18181
gildas-lormeau/single-file-cli
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Language:JavaScript1k 10 142100
bellingcat/auto-archiver
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
Language:Python981 24 14789
webrecorder/browsertrix-crawler
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Language:TypeScript914 22 408121
Ray-D-Song/web-archive
Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。
Language:TypeScript896 7 35295
webrecorder/replayweb.page
Serverless replay of web archives directly in the browser
Language:TypeScript857 15 19779
oduwsdl/ipwb
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
Language:Python647 20 51742
akamhy/waybackpy
Wayback Machine API interface & a command-line tool
Language:Python552 8 8636
eclaire-labs/eclaire
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
Language:TypeScript50251
harvard-lil/perma
Indelible links
Language:JavaScript488 23 1.7k81
rahiel/archiveror
Archiveror will help you preserve the webpages you love. 💾
Language:JavaScript449 18 5742
webrecorder/webrecorder-player
Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
Language:JavaScript446 36 5242
webrecorder/warcio
Streaming WARC/ARC library for fast web archive IO
Language:Python438 21 8965
oduwsdl/archivenow
A Tool To Push Web Resources Into Web Archives
Language:Python423 19 3940
Florents-Tselai/WarcDB
WarcDB: Web crawl data as SQLite databases.
Language:Python406 8 1910
machawk1/wail
:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation
Language:Roff381 10 46138
ArchiveBox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
Language:JavaScript375 8 3438
webrecorder/browsertrix
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
Language:TypeScript352 9 1.3k60
machawk1/warcreate
Chrome extension to "Create WARC files from any webpage"
Language:JavaScript224 15 12115
cocrawler/cdx_toolkit
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Language:Python186 9 2834
ArchiveBox/electron-archivebox
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
Language:JavaScript178 6 615
gwu-libraries/sfm-ui
Social Feed Manager user interface application.
Language:Python156 26 80226
helgeho/ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Language:Scala153 15 2519
programminghistorian/ph-submissions
The repository and website hosting the peer review process for new Programming Historian lessons
Language:HTML147 45 435115
N0taN3rd/wail
:whale2: One-Click User Instigated Preservation
Language:JavaScript129 11 1059
internetarchive/fatcat
Perpetual Access To The Scholarly Record
Language:Python119 15 8318
maxcountryman/warc-parquet
🗄️ A simple CLI for converting WARC to Parquet.
Language:Rust113 3 21
N0taN3rd/node-warc
Parse And Create Web ARChive (WARC) files with node.js
Language:JavaScript102 7 1522
Own-Data-Privateer/hoardy-web
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
Language:Python101 3 199
oduwsdl/warrick
Recover lost websites from the Web Infrastructure
Language:HTML89 7 1610
xarantolus/Collect
A server to collect & archive websites that also supports video downloads
Language:TypeScript86 4 2312
PKHarsimran/website-downloader
Website-downloader is a powerful and versatile Python script designed to download entire websites along with all their assets. This tool allows you to create a local copy of a website, including HTML pages, images, CSS, JavaScript files, and other resources. It is ideal for web archiving, offline browsing, and web development.
Language:Python81 4 420
oduwsdl/MemGator
A Memento Aggregator CLI and Server in Go
Language:Go70 10 12511

web-archiving

ArchiveBox/ArchiveBox

webrecorder/pywb

Rhizome-Conifer/conifer

webrecorder/archiveweb.page

gildas-lormeau/single-file-cli

bellingcat/auto-archiver

webrecorder/browsertrix-crawler

Ray-D-Song/web-archive

webrecorder/replayweb.page

oduwsdl/ipwb

akamhy/waybackpy

eclaire-labs/eclaire

harvard-lil/perma

rahiel/archiveror

webrecorder/webrecorder-player

webrecorder/warcio

oduwsdl/archivenow

Florents-Tselai/WarcDB

machawk1/wail

ArchiveBox/archivebox-browser-extension

webrecorder/browsertrix

machawk1/warcreate

cocrawler/cdx_toolkit

ArchiveBox/electron-archivebox

gwu-libraries/sfm-ui

helgeho/ArchiveSpark

programminghistorian/ph-submissions

N0taN3rd/wail

internetarchive/fatcat

maxcountryman/warc-parquet

N0taN3rd/node-warc

Own-Data-Privateer/hoardy-web

oduwsdl/warrick

xarantolus/Collect

PKHarsimran/website-downloader

oduwsdl/MemGator