Internet Archive
The Internet Archive is "the library of the Internet", and a big supporter of Free Software.
San Francisco
Pinned Repositories
bookreader
The Internet Archive BookReader
brozzler
brozzler - distributed browser-based web crawler
cicd
build & test using github registry; deploy to nomad clusters
heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
openlibrary
One webpage for every book ever published!
openlibrary-client
Python Client Library for the Archive.org OpenLibrary API
warcprox
WARC writing MITM HTTP/S proxy
wayback
IA's public Wayback Machine (moved from SourceForge)
wayback-machine-webextension
A web browser extension for Chrome, Firefox, Edge, and Safari 14.
Zeno
State-of-the-art web crawler 🔱
Internet Archive's Repositories
internetarchive/openlibrary
One webpage for every book ever published!
internetarchive/heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
internetarchive/bookreader
The Internet Archive BookReader
internetarchive/brozzler
brozzler - distributed browser-based web crawler
internetarchive/Zeno
State-of-the-art web crawler 🔱
internetarchive/internetarchivebot
internetarchive/openlibrary-bots
A repository of cleanup bots implementing the openlibrary-client
internetarchive/iaux
Monorepo for Archive.org UX development and prototyping.
internetarchive/hind
Hashistack-IN-Docker (single container with nomad + consul + caddy)
internetarchive/infogami
internetarchive/gowarc
Read and write WARC files in Go
internetarchive/crawling-for-nomore404
internetarchive/iiif
The official Internet Archive IIIF service
internetarchive/doppelganger
URL-agnostic WARC dedupe server
internetarchive/iaux-typescript-wc-template
IAUX Typescript WebComponent Template
internetarchive/iaux-collection-browser
internetarchive/nomad
CI/CD code to manage and deploy to Nomad clusters. CI/CD uses a GitHub Actions reusable workflow; deploy phase sends just built containers to a nomad cluster. Contains helpful aliases for devs, including "hot sync" of code into deploys
internetarchive/openlibrary-api
API documentation for https://github.com/internetarchive/openlibrary
internetarchive/iaux-search-service
internetarchive/iaux-donation-form
The Internet Archive Donation Form
internetarchive/iaux-metadata-service
A service for fetching metadata about items in the Internet Archive
internetarchive/iaux-modal-manager
A Modal Manager WebComponent
internetarchive/iaux-histogram-date-range
Internet Archive histogram-date-range picker
internetarchive/iaux-item-metadata
internetarchive/iaux-monthly-giving-circle
internetarchive/iaux-notification-toast
displays notifications and automatically clears them
internetarchive/iaux-reviews
Web component for displaying and editing Internet Archive reviews
internetarchive/iaux-icons
SVG icons and ia-icon component monorepo
internetarchive/iaux-feature-feedback
internetarchive/tvnews_socialmedia_mentions
Google Summer of Code (GSoC) 2025 TV News Archive Social Media Mentions project