Internet Archive
The Internet Archive is "the library of the Internet", and a big supporter of Free Software.
San Francisco
Pinned Repositories
bookreader
The Internet Archive BookReader
brozzler
brozzler - distributed browser-based web crawler
cicd
build & test using github registry; deploy to nomad clusters
dweb-mirror
Offline Internet Archive project
heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
openlibrary
One webpage for every book ever published!
openlibrary-client
Python Client Library for the Archive.org OpenLibrary API
warcprox
WARC writing MITM HTTP/S proxy
wayback
IA's public Wayback Machine (moved from SourceForge)
wayback-machine-webextension
A web browser extension for Chrome, Firefox, Edge, and Safari 14.
Internet Archive's Repositories
internetarchive/openlibrary
One webpage for every book ever published!
internetarchive/heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
internetarchive/bookreader
The Internet Archive BookReader
internetarchive/brozzler
brozzler - distributed browser-based web crawler
internetarchive/wayback-machine-webextension
A web browser extension for Chrome, Firefox, Edge, and Safari 14.
internetarchive/warcprox
WARC writing MITM HTTP/S proxy
internetarchive/openlibrary-client
Python Client Library for the Archive.org OpenLibrary API
internetarchive/internetarchivebot
internetarchive/iaux
Monorepo for Archive.org UX development and prototyping.
internetarchive/openlibrary-bots
A repository of cleanup bots implementing the openlibrary-client
internetarchive/hind
Hashistack-IN-Docker (single container with nomad + consul + caddy)
internetarchive/Zeno
State-of-the-art web crawler 🔱
internetarchive/infogami
internetarchive/wayback-diff
React components to render differences between captures at the Wayback Machine
internetarchive/iiif
The official Internet Archive IIIF service
internetarchive/arch
Web application for distributed compute analysis of Archive-It web archive collections.
internetarchive/iari
Import workflows for the Wikipedia Citations Database
internetarchive/Sparkling
Internet Archive's Sparkling Data Processing Library
internetarchive/iaux-typescript-wc-template
IAUX Typescript WebComponent Template
internetarchive/iare
An interactive IARI JSON viewer
internetarchive/dyno
internetarchive/iaux-collection-browser
internetarchive/openlibrary-api
API documentation for https://github.com/internetarchive/openlibrary
internetarchive/ads-common
Common components and utilities for the Archiving & Data Services (ADS) team at the Internet Archive
internetarchive/archiveorg-e2e-playwright
internetarchive/rclone
[vault fork] of "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files
internetarchive/iaux-book-actions
IA lending bar controls for bookreader
internetarchive/iaux-modal-manager
A Modal Manager WebComponent
internetarchive/wbm_ai_kg
Google Summer of Code (GSoC) 2024 Wayback Machine GenAI Knowledge Graph project
internetarchive/wbm_ai_sum
Google Summer of Code (GSoC) 2024 Wayback Machine GenAI Archival Summary project